Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
Hybrid-policy reinforcement learning framework for multi-modal LLMs with exploration strategy to prevent entropy collapse during RL training.
Hybrid-policy reinforcement learning framework for multi-modal LLMs with exploration strategy to prevent entropy collapse during RL training.
Framework addressing class imbalance, overlap, and noise in multi-class learning using regional partitioning and meta-heuristic ensembles.
Layer gradient analysis method for identifying optimal layers for knowledge editing in LLMs while preserving model behavior.
ESM framework for merging multiple task-specific fine-tuned models using principal component analysis to reduce task interference.
Unified multimodal generative framework for crystal structure prediction and de novo generation across different modalities.
KnapSpec framework reformulates self-speculative decoding as knapsack problem to optimize LLM inference throughput through adaptive layer selection.
Extension of TabPFN foundation model to handle multimodal tabular data integrating images, text, and tables in unified framework.
Convex optimization-based clustering algorithm with LLM integration for analyzing biomedical literature and detecting trends in anti-aging research.
Neural wavefunction approach combining coupled-cluster theory with learnable molecular orbitals for quantum chemistry calculations.
Multi-task deep learning model for delivery delay prediction in complex logistics networks with uncertainty quantification.
Study of LLM truthfulness representations across domain-general and domain-specific directions using probe generalization across five truth types.
Discrete diffusion framework using sample-efficient estimators for generative modeling over discrete state spaces with conditional probabilities.
Curriculum learning approach that recursively decomposes complex datasets into simpler components using teacher-student framework with step-by-step reasoning.
Surrogate models for cardiac mechanics using geometric encoding and generative augmentation in data-scarce clinical settings.
Time-series foundation models augmented with in-context learning to adapt to unseen tasks without fine-tuning.
QuantVLA: post-training quantization framework for Vision-Language-Action models to reduce compute and memory demands for embodied AI agents.
CaDrift: synthetic data generator using Structural Causal Models to create data streams with controlled distributional and covariate shifts for evaluating ML methods.
Analysis of representation geometry dynamics during chain-of-thought reasoning in LLMs using manifold capacity theory.
Self-supervised graph learning method incorporating fragment-level information for molecular representation learning.
Plug-and-play guidance method for flow-based generative models improving sample fidelity without doubling inference cost.
Quantitative approximation rates for neural networks with group equivariance constraints.
Method incorporating causal context into Shapley values for accurate multivariate feature importance measurement.
Pre-training framework leveraging geometric data for efficient neural physics simulation with better transfer.
Analysis of safety challenges in unsupervised elicitation techniques for steering language models toward truthful outputs.
Online learning algorithm with Wasserstein distributionally robust optimization for risk-averse sequential decisions.
Framework for active exploration and model estimation in tabular MDPs based on coverage-based complexity.
Certification method for verifying DNN ownership against model extraction attacks in MLaaS systems.
Differentiable framework using Gaussian reparameterization for scheduling optimization in compilation and synthesis.
Analysis of how protein language models diverge from natural language transformers with improved inference methods.
Online alignment method for LLMs under misspecified preference feedback, extending SAIL framework.
Attention Neural Teaching paradigm to reduce training costs for transformer-based attention learners.
Theoretical analysis of drifting models via flow-map decomposition for generative modeling.
Neural network pruning technique balancing compression and information preservation in fully-connected networks.
Graph neural network approach for detecting anomalies in multivariate time-series data with noise robustness.
Research on normalizing flows and invertible neural networks for generative modeling and inverse problems.
Decentralized federated learning approach for multi-task LLM fine-tuning using sparse-orthogonal LoRA over wireless connections.
Apprenticeship learning framework for intelligent tutoring systems addressing sample efficiency and reward function design in educational RL.
Memory-guided prototypical learning approach for mixed emotion recognition from multi-modal physiological and behavioral signals.
ACTOR-CURATOR framework for automated curriculum learning in LLM reinforcement learning post-training via policy-improvement bandits.
Sample-efficient model evidence estimation for prior selection in Bayesian inverse problems using diffusion models.
GenSR framework for symbolic regression using generative latent space to improve equation discovery from data.
Theoretical analysis of Push-Sum decentralized optimization over directed graphs with stability and generalization guarantees.
Benchmarks GNN models on molecular property prediction using CKA-based representation analysis of SMILES fingerprints.
Proposes GATES, a self-distillation method with consensus gating for document-grounded QA without ground truth labels.
Studies online maximization of non-monotone DR-submodular functions over convex sets with improved regret bounds.
Demonstrates triggerless backdoor attacks in vertical federated learning exploiting feature manipulations without visible triggers.
Introduces QEDBench benchmark quantifying alignment gaps in LLM-as-judge evaluation of university-level mathematical proofs.
Proposes TrajGPT-R, a transformer with reinforcement learning for generating privacy-preserving urban mobility trajectories at scale.
Develops Bayesian deep functional learning with structured region selection for sparse effects in complex continuous data.
Proposes UrbanFM, a spatio-temporal foundation model for urban systems that generalizes across regions and tasks.