Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR
Benchmark environment for studying reward hacking in RL agents through dual-access mathematical reasoning tasks.
Benchmark environment for studying reward hacking in RL agents through dual-access mathematical reasoning tasks.
InvAdam optimizer variant that improves generalization by finding flatter minima than standard Adam.
AI agent framework using offline RL for structured planning and reasoning in image editing tasks.
LLM-based system for automated CUDA kernel optimization across ML and scientific computing domains.
Method to improve OOD detection by diversifying parameter contribution patterns in classifiers.
GNN surrogate model for simulating reinforced concrete beams under bending using spatiotemporal graphs.
wDPO improves DPO for LLM alignment by using winsorization to handle noisy preference data robustly.
Theoretical analysis of margin-based learning in metric spaces and generalization guarantees independent of parameter count.
Empirical study on knowledge distillation and difficulty-aware training for improving LLM performance in finance domain.
Lightweight UNet-style architecture for 3D medical image segmentation with learned spatial anchors and anatomical priors.
PT-RAG uses retrieval-augmented generation to predict cellular responses to gene perturbations with improved generalization.
WeDas framework improves web search agents by matching queries to web content distribution structures for better evidence retrieval.
Federated learning approach for predicting secondary cancer using heterogeneous features across hospitals.
Symbolic machine learning method to convert chaotic time series into interpretable algebraic equations for forecasting.
Multi-objective reinforcement learning applied to outpatient clinic scheduling with adaptive double-booking policies.
AutoResearch-RL is an RL agent that autonomously conducts perpetual neural architecture and hyperparameter search via code modification without human supervision.
Retrieval-augmented multi-scale framework for county-level crop yield prediction addressing regional and temporal challenges in agricultural forecasting.
Adversarial latent-state training framework for robust policies in partially observable MDPs under latent distribution shift with theoretical guarantees.
ShakyPrepend applies differential privacy-inspired tools to multi-group learning for improved sample complexity and adaptation to group structure.
Analyzes norm-hierarchy transitions explaining when neural networks transition from spurious shortcuts to structured representations during training.
Learning concept bottleneck models from mechanistic explanations instead of pre-specified or LLM-prompted concepts for improved interpretability and predictive power.
Addresses representation entanglement between physiologic signal and institutional artifacts in clinical ML under systematic distribution shift from heterogeneous practices.
Develops tunable-complexity priors for diffusion models and normalizing flows to balance representation error and overfitting in inverse problem solving.
N-Tree Diffusion enables efficient long-horizon wildfire risk forecasting by hierarchically extending diffusion models across multiple prediction steps.
Examines neural scaling laws in sub-20M parameter regime for TinyML/edge AI, showing both ConvNets and MobileNetV2 follow power law error scaling.
Hierarchical multi-agent RL framework for controlling reconfigurable intelligent surfaces in mmWave systems without channel state information estimation overhead.
Accelerates multi-task learning gradient balancing through bi-level optimization to improve MGDA-type methods for handling task conflicts.
Deterministic fuzzy triage system for legal compliance classification using dual encoders and transparent bands, demonstrated on contractual evidence HIPAA/NERC-CIP alignment.
Generalizes linear autoencoder recommender systems by decoupling expected quadratic loss to improve hyperparameter flexibility beyond prior constraints.
DualSpec accelerates LLM-based research agents by speculating on actions during reasoning to reduce latency in long-horizon information-seeking tasks with tool use.
Data Agent uses end-to-end optimization to dynamically select informative samples during training acceleration.
Cost-driven state representation learning for control tasks from high-dimensional partial observations.
Tokenization approach enables transformers to outperform gradient boosting on tabular forecasting tasks.
Diffusion transformer framework generates 3D genome structures conditioned on Hi-C contact maps.
Unified framework for knowledge transfer between models of different sizes, enabling bidirectional scaling.
OCLADS framework for continual learning in IoT anomaly detection under non-stationary data distributions.
Theoretical analysis connecting drifting models and score-based generative models through kernel-weighted discrepancy.
RL framework optimizes cleaning schedules for solar panels using PPO algorithm in arid regions.
Method for transferring knowledge from pre-trained models to different architectural scales using frequency-domain information.
Neural dynamics-informed pre-training framework for personalized brain functional network construction addressing heterogeneous neural activity patterns.
Data-driven approach using dynamic latent space representations for generative prediction of laser-induced rocket ignition with uncertainty quantification.
Obliviator method revealing vulnerability of concept erasure to nonlinear adversaries, analyzing statistical dependencies in representation unlearning.
ECG classification on PTB-XL dataset using simplified CNN-VAE with data-centric approach for cardiovascular disease detection.
Constraints Matrix Diffusion-based generative neural solver for vehicle routing problems emphasizing local optimization and small-scale generalization.
TS-MLLM: multi-modal LLM framework for industrial time-series analysis combining temporal signals, frequency-domain visuals, and textual knowledge for prognostics.
TT-Sparse: neural building block for learning interpretable sparse rule models using differentiable truth tables balancing performance and human-understandable complexity.
Visual representation framework encoding signals as low-rank adaptations to frozen diffusion foundation models for compact storage and reuse.
Helix: evolutionary reinforcement learning system combining LLMs with RL for open-ended scientific problem solving with improved exploration and generalization.
Critical review synthesizing classical numerical methods and machine learning approaches for solving PDEs, examining six fundamental computational challenges.
Theoretical analysis of relationships between surrogate losses and evaluation metrics, addressing metric mismatch between offline validation and online performance.