Simple LLM Baselines are Competitive for Model Diffing
Simple LLM Baselines are Competitive for Model Diffing
Simple LLM Baselines are Competitive for Model Diffing
Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs
Deep learning outperforms traditional machine learning methods in predicting childhood malnutrition: evidence from survey data
Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series
Colorful Talks with Graphs: Human-Interpretable Graph Encodings for Large Language Models
Affordances Enable Partial World Modeling with LLMs
Tensor Methods: A Unified and Interpretable Approach for Material Design
Experimental Demonstration of Online Learning-Based Concept Drift Adaptation for Failure Detection in Optical Networks
Modular Multi-Task Learning for Chemical Reaction Prediction
Gated Removal of Normalization in Transformers Enables Stable Training and Efficient Inference
LUCID: Attention with Preconditioned Representations
LightGTS-Cov: Covariate-Enhanced Time Series Forecasting
AI-rithmetic
Equivariant Evidential Deep Learning for Interatomic Potentials
Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning
Breaking the Curse of Repulsion: Optimistic Distributionally Robust Policy Optimization for Off-Policy Generative Recommendation
QTALE: Quantization-Robust Token-Adaptive Layer Execution for LLMs
A Dual-Stream Physics-Augmented Unsupervised Architecture for Runtime Embedded Vehicle Health Monitoring
Control Reinforcement Learning: Token-Level Mechanistic Analysis via Learned SAE Feature Steering
LakeMLB: Data Lake Machine Learning Benchmark
Chamfer-Linkage for Hierarchical Agglomerative Clustering
A Unified Theory of Random Projection for Influence Functions
Constructing Industrial-Scale Optimization Modeling Benchmark
A Multimodal Conditional Mixture Model with Distribution-Level Physics Priors
Analyzing Fairness of Neural Network Prediction via Counterfactual Dataset Generation
Driving Reaction Trajectories via Latent Flow Matching
Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation
Low-Dimensional Execution Manifolds in Transformer Learning Dynamics: Evidence from Modular Arithmetic Tasks
Enhancing Ride-Hailing Forecasting at DiDi with Multi-View Geospatial Representation Learning from the Web
Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation
Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving
Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models
A Swap-Adversarial Framework for Improving Domain Generalization in Electroencephalography-Based Parkinson's Disease Prediction
What Makes Value Learning Efficient in Residual Reinforcement Learning?
Bridging the Compression-Precision Paradox: A Hybrid Architecture for Clinical EEG Report Generation with Guaranteed Measurement Accuracy
$\mu$pscaling small models: Principled warm starts and hyperparameter transfer
Contrastive Learning for Multi Label ECG Classification with Jaccard Score Based Sigmoid Loss
Online Min-Max Optimization: From Individual Regrets to Cumulative Saddle Points
Gauss-Newton Unlearning for the LLM Era
LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization
When Gradient Clipping Becomes a Control Mechanism for Differential Privacy in Deep Learning
Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity
TRACE: Theoretical Risk Attribution under Covariate-shift Effects
Roughness-Informed Federated Learning
Learning Mixture Density via Natural Gradient Expectation Maximization
dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning
Hierarchical Zero-Order Optimization for Deep Neural Networks
On the Role of Consistency Between Physics and Data in Physics-Informed Neural Networks
Pupillometry and Brain Dynamics for Cognitive Load in Working Memory
Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling