A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation
Multi-fidelity policy gradient method for reinforcement learning using low-fidelity simulators to improve sample efficiency.
Multi-fidelity policy gradient method for reinforcement learning using low-fidelity simulators to improve sample efficiency.
Incentive mechanism for federated learning that prioritizes high-quality contributions during critical learning periods.
Analysis of gradient compression, staleness, and data heterogeneity interactions in asynchronous federated learning systems.
Defense against backdoor attacks in federated learning using representative-attention mechanisms to detect behavioral anomalies.
Memory-efficient fine-tuning of quantized LLMs using zeroth-order optimization to eliminate gradient and optimizer state storage.
Metric for evaluating generalization in diffusion distillation models via probability flow distance.
Analysis of fairness impacts when using task vectors for efficient model editing through task arithmetic operations.
End-to-end framework discovering task-relevant symmetries through learnable augmentations in equivariant neural networks.
Framework for autonomous scientific discovery using LLMs guided by Bayesian surprise to identify novel research questions without human direction.
Learning collective variables for molecular dynamics simulations using time-lagged generation to accelerate rare event sampling.
Neural architecture search using graph-based evidence of architectural modifications for efficient fine-grained network design.
Ensemble method combining text embeddings while accounting for model-specific uncertainty across domains and tasks.
Framework for machine unlearning in conformal predictors that removes influence of specific data while maintaining prediction coverage.
Membership inference attacks adapted to time series forecasting models, analyzing privacy risks in temporal prediction systems.
Diffusion bridge variational inference for improving posterior inference in deep Gaussian processes.
Binary autoencoder method for interpreting LLM hidden states with improved feature sparsity and atomization guarantees.
Prior-data fitted networks applied to graph domain, addressing transferability and data scarcity challenges in graph foundation models.
DistillKac generates images in few steps using damped wave equations with finite speed transport, alternative to diffusion models.
DRIFT-Net uses spectral-coupled neural operators for learning PDE dynamics with improved efficiency over classical solvers.
OpenTSLM integrates time series as native modality into LLMs for clinical data reasoning, addressing LLM limitations with temporal data.
Hierarchical approach to address spurious correlations in supervised learning under distribution shifts, extending Group DRO methods.
KVComm framework enables efficient multi-agent LLM communication by sharing key-value cache instead of natural language or hidden states, reducing inference costs.
Analysis of alignment tipping process where self-evolving LLM agents abandon safety constraints through continual interaction.
Primal-dual DPO algorithm with convergence guarantees for constrained LLM alignment with safety constraints.
Data-dependent error bounds for Gibbs and Langevin algorithms in overparameterized interpolation regime.
Portfolio approach to time series forecasting using ensembles of smaller pretrained models instead of monolithic foundation models.
Theoretical foundation for reinforcement learning with verifiable rewards via gradient gap analysis at trajectory and token levels.
Theoretical analysis of Mamba's in-context learning capability on low-dimensional nonlinear target functions.
Analysis of partial prototype collapse in prototypical self-supervised learning with diagnostic and prevention methods.
Test-time alignment method for LLMs using sampling-based optimal control with Gaussian perturbation in pre-logit space.
Position paper on privacy-preserving, federated AI systems for elderly monitoring beyond fall detection.
Self-adaptive ensemble method for graph neural networks that selects best model per sample without additional training.
Hierarchical latent diffusion model generating phonocardiogram signals from clinical metadata for medical data augmentation.
Neurosymbolic framework integrating modal logic with neural networks for reasoning about necessity and possibility.
Theoretical framework explaining gradient information recovery window in gated recurrent networks via effective learning rates.
Direct steering optimization method for mitigating demographic bias in vision-language models with user-controlled tradeoffs.
Analytical model explaining LLM-as-a-judge inference-time scaling using Bayesian regression and reward sampling.
Metric for evaluating multi-horizon time series forecasts accounting for accuracy and temporal consistency.
Decoupled diffusion framework for inverse PDE problems using unconditional diffusion and neural operators.
Analysis of neural network performance against theoretical limits using exact posteriors from normalizing flows, examining scaling laws and uncertainty decomposition.
Minerva applies reinforcement learning with verifiable rewards to train LLMs for cyber threat intelligence standardization tasks.
CardinalGraphFormer applies graph transformers to molecular property prediction with attention augmentation for drug discovery applications.
Study of reinforcement learning for autonomous cyber defense agents, examining reward function design beyond traditional dense reward approaches.
CoSA proposes compressed sensing-based approach for parameter-efficient fine-tuning of LLMs, addressing expressivity limitations of low-rank decomposition methods like LoRA.
Analysis of GRPO limitations in exploration and difficulty adaptation stemming from implicit advantage symmetry in reward estimation.
Mathematical note on martingale theory, conditional expectation, and applications to branching processes.
Evolutionary algorithm automatically generates multi-agent system architectures from LLMs without code generation limitations.
Theory predicting neural scaling law exponents from natural language statistics for data-limited LLM scaling.
Causal Schrödinger Bridges use constrained optimal transport for robust generative modeling under causal interventions.
Theoretical analysis of covariate shift as positive distribution shift for understanding tractable learning scenarios.