Continuous-time learning framework for probability distributions applied to glucose monitoring in pediatric diabetes clinical trial.
Analysis of why self-distillation degrades LLM reasoning capability by suppressing epistemic verbalization and expression of uncertainty.
Composer 2 model specialized for agentic software engineering with long-term planning and coding abilities trained via continued pretraining and reinforcement learning.
Multi-agent framework with verification for improving calibration and accuracy in medical multiple-choice question answering.
Bayesian optimization method combining penalty formulation and trust region strategy for constrained black-box optimization.
Study evaluating RAG systems on AI policy analysis showing retrieval improvements don't guarantee better answers on complex regulatory documents.
Counterfactual learning approach for conversion rate estimation in recommender systems addressing data sparsity and selection bias.
Inverse-forward differentiation method to reduce memory requirements for backpropagation by avoiding activation storage.
Learning-theoretic framework for coded computing in distributed systems to handle slow, faulty, or compromised servers.
Visualization technique for understanding RNN internal dynamics during training using multislice PHATE algorithm.
Statistical method for heterogeneous treatment effect estimation using local proximity constraints in observational data.
Physics-informed neural networks using wavelet decomposition to improve training on differential equations with rapid oscillations and steep gradients.
arXiv paper on Symmetry-Guided Memory Augmentation (SGMA) improving efficiency of RL-based legged locomotion training.
arXiv paper on machine learning techniques to detect and localize power/radiation leakage of cryptographic keys from hardware implementations.
arXiv paper on multi-agent reinforcement learning for adaptive traffic signal control in heterogeneous urban networks.
arXiv paper: GraphOmni benchmark framework evaluating LLM reasoning on graph-theoretic tasks with diverse formats and serializations.
arXiv paper introducing Distance Explainer method for post-hoc interpretability of embedded vector spaces in ML models.
arXiv paper on Bottlenecked Transformers: KV cache consolidation technique for scaling inference-time reasoning in LLMs.
arXiv paper interpreting neural networks as dynamical systems on latent manifolds, analyzing autoencoder vector fields.
arXiv paper on scalable longitudinal patient pathway modeling from multimodal EHR data using neural networks for condition forecasting.
Research paper demonstrating LLMs perform in-context reinforcement learning during inference. ICRL prompting framework enables inference-time self-improvement.
TimeRecipe benchmarks module-level effectiveness of components in time-series forecasting architectures.
Brain foundation model with Cauchy-Schwarz divergence for cross-subject motor imagery EEG decoding in BCIs.
Classification framework using symbolic dynamics, chaotic maps, and data compression for pattern recognition.
DART adds server-side robustness to federated learning for edge devices without expensive client-side computation.
TimeAlign uses contrastive learning and representation alignment for time series forecasting by bridging input-target distributions.
Theoretical analysis of federated distillation with weighted aggregation of client predictions under class mismatch.
Alternative classification approach using signal separation and trigonometric polynomial kernels for compact metric spaces.
PromptLoop refines prompts for diffusion models using sequential reinforcement learning feedback during sampling.
Generative method for synthetic financial time series data to address data shortage in ML models for trading and investment.
Physics-informed neural network for recovering Raman spectra from CARS measurements using scientific theory as inductive bias.
Develops score-based density estimation from pairwise comparisons for learning from human feedback and expert knowledge elicitation.
Proposes future summary pretraining for LLMs as alternative to next-token prediction, addressing limitations in long-horizon reasoning and planning tasks.
Addresses distribution shift in time-series forecasting by identifying concept drift and temporal shift, proposing mitigation strategies for generalization.
OffSim proposes model-based offline inverse RL framework to learn environmental dynamics and reward functions from offline data without manual definition.
MedM2T is a multimodal framework integrating sparse time series encoding and hierarchical fusion for healthcare data with electronic health records and ECG signals.
SigmaDock uses fragment-based SE(3) diffusion for molecular docking in drug discovery, improving upon generative approaches with better chemical plausibility.
Applies deep RL to dynamic origin-destination matrix estimation in traffic simulations, addressing credit assignment across temporal vehicle dynamics.
QUARK is an FPGA acceleration framework using quantization to exploit common patterns in transformer nonlinear operations for efficient inference.
Proposes curiosity-driven quantized Mixture-of-Experts framework using Bayesian uncertainty for deploying neural networks on resource-constrained devices.
Uses data-driven surrogate models to improve Model Predictive Control for nuclear reactor core simulation.
ContagionRL is a Gymnasium-compatible RL platform for reward engineering in spatial epidemic simulations, enabling systematic study of learned behavioral strategies.
Presents wild refitting method for excess risk evaluation in empirical risk minimization without requiring knowledge of function class structure.
Investigates model inversion attacks on latent diffusion models, showing non-uniform memorization patterns across latent codes.
Applies quantum-classical physics-informed neural networks to reservoir seepage modeling across multiple flow equations.
Analyzes relationship between deep neural networks and discrete dynamical systems, comparing PINN solutions to standard numerical methods for PDEs.
Develops Hessian-free actor-critic algorithm for bi-level RL optimization with applications to LLM fine-tuning, addressing second-order information requirements in policy optimization.
Introduces continual learning task for GUI agents that must adapt to shifting domains and resolutions over time, identifying failure modes in existing agent methods.
Study of variance in agentic system evaluations using 60,000 trajectories on SWE-Bench-Verified, showing pass@1 estimates vary significantly across runs, questioning single-run reliability assumptions.
AceGRPO proposes adaptive curriculum learning with group relative policy optimization for autonomous ML engineering agents, addressing behavioral stagnation in LLM-based agents through RL with efficient data selection.