VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models
Proposes VC-Soup method for aligning LLMs with multiple potentially conflicting human values through value-consistency guided optimization.
Proposes VC-Soup method for aligning LLMs with multiple potentially conflicting human values through value-consistency guided optimization.
LLM-augmented computational phenotyping framework for discovering clinical subphenotypes in Long COVID through iterative hypothesis generation and evidence extraction.
Framework for detecting conflicts in policy languages that use probabilistic ML predicates, applied to semantic router DSL for LLM routing systems.
Improves PDE surrogate model training through gradient-informed temporal sampling strategies that optimize rollout accuracy under fixed data budgets.
Proposes AGRI-Fidelity framework to evaluate reliability of explainable AI for poultry disease detection in noisy farm environments.
Framework for evaluating reasoning-based LLMs on de novo molecular generation and drug discovery without requiring ground-truth molecule pairs.
Proposes Interventional Boundary Discovery to identify causal state dimensions agents can control, using Pearl's do-operator for causal identification.
Addresses the squeezing effect in Direct Preference Optimization (DPO) for LLM alignment using sharpness-aware minimization in logit space.
Studies alignment evaluation in LLMs by examining political censorship in Chinese language models, focusing on routing mechanisms beyond concept detection and refusal behaviors.
Additive Gaussian processes for wind farm power prediction using population-based structural health monitoring perspective.
Path-constrained mixture-of-experts architecture constraining expert routing paths to improve statistical efficiency and meaningful parameter structure.
ALIGN: adversarial learning framework for session-invariant speech neuroprosthesis decoding from brain-computer interfaces.
Neural graph representation learning with RL for approximate subgraph matching, an NP-hard problem in graph analysis.
Autocurriculum training methods with provable benefits for chain-of-thought reasoning in language models with reduced data/compute costs.
Vector-field reward shaping for offline RL to enable safe exploration near dataset boundaries using simulator confidence.
Epistemic GANs using Dempster-Shafer theory to improve output diversity and architectural enhancements for generative models.
Comprehensive book on mathematical foundations of deep learning covering neural network approximation theory, optimal control, RL, and generative models.
RE-SAC: ensemble deep reinforcement learning for bus fleet control that disentangles aleatoric and epistemic uncertainty.
Flow matching approach for de novo molecular structure elucidation from mass spectra using deep generative models.
AFBS-BO framework for automated hyperparameter optimization of sparse attention mechanisms in transformers via adaptive fidelity Bayesian optimization.
Quantum multi-armed and stochastic linear bandits algorithms robust to noise in NISQ devices, achieving quadratic speedups over classical methods.
Sample-efficient reward estimation method for RL with verifiable rewards in large language model post-training.
Training suite for film shot language understanding using vision-language models to match expert cinematographic analysis.
Distributed asynchronous RL framework for Vision-Language-Action models with integrated trainable world models.
Calibration-free pruning method for Mixture-of-Experts language models to reduce memory and serving overhead.
Policy optimization approach addressing overthinking in large reasoning models through difficulty-differentiated training.
Study on synthetic data augmentation for efficient pre-training with better loss scaling using synthetic megadocs.
Research on active auditing framework against backdoor attacks in decentralized federated learning systems.
GAPSL: gradient-aligned parallel split learning for federated learning on heterogeneous data, reducing client computational load.
Transfers statistical methods from particle physics for UAV propeller fault detection using spectral features and neural inference.
SINDy-KANs combines Kolmogorov-Arnold networks with sparse identification to learn interpretable equations for nonlinear dynamical systems.
Shows Transformers learn robust in-context regression under distributional uncertainty without restrictive assumptions on data and noise.
SpecForge: open-source training framework for speculative decoding draft models, improving LLM inference latency through token batching.
Demonstrates adversarial attacks on GNNs exploitable through unlearning mechanisms designed for GDPR compliance in graph learning systems.
Systematic analysis of Elastic Weight Consolidation for continual learning, identifying issues with importance estimation and weight regularization methods.
Evaluates model-free policy optimization algorithms using exact blackjack oracle with ground-truth benchmarks for discrete stochastic control.
Investigates multi-corpus training in speech spoofing detection using self-supervised learning, finding domain-specific biases harm generalization.
Studies label inference attacks in vertical federated learning, analyzing vulnerabilities when passive parties infer active party's labels and proposing defenses.
HISR proposes segmental process rewards for multi-turn RL in LLM agents, addressing sparse reward propagation and credit assignment in long-horizon decision-making tasks.
Investigates transfer learning from audio and time-series foundation models to scientific time-series via cross-domain distillation.
Proposes OCP method for improving item embeddings in large-scale commodity recommendation systems.
Studies off-policy learning in contextual bandits with supply constraints for recommendation and advertising systems.
Causal-theoretic approach for reward modeling using observational user feedback instead of expensive annotated data for RLHF alignment.
Ablation study examining necessity of components in Group Relative Policy Optimization for teaching LLMs reasoning and mathematical ability.
Deep VAE-GAN approach improving reservoir parameterization for data assimilation in petroleum reservoir simulation.
AutoPipe framework for automatically configuring LLM post-training pipelines combining supervised fine-tuning and reinforcement learning under budget constraints.
Study on using discriminators to enhance generative model training across GANs, weak learner frameworks, and diffusion models.
Method mitigating asynchronous data drift in federated learning where different devices experience different distribution shifts.
Neuroscience framework introducing authority-level priors to hierarchical predictive processing for understanding autonomic regulation.
Theoretical error analysis of Adam optimizer for training deep neural networks and beyond, addressing open research gaps.