Safety Training Persists Through Helpfulness Optimization in LLM Agents
Study of safety training persistence in multi-step agentic LLM settings when optimizing for helpfulness, comparing DPO effects.
Study of safety training persistence in multi-step agentic LLM settings when optimizing for helpfulness, comparing DPO effects.
Generalized discrete diffusion model with self-correction during pretraining using uniform-absorbing objective.
Principled mathematical framework for reward modeling leveraging ordinal preference feedback from human annotators for LLM alignment.
Personalized federated learning approach using kernel mean embeddings to learn inter-agent weight combinations without raw data sharing.
Framework for automatic specification generation to improve neural network verification tool adoption by supporting higher-level semantic constraints.
CUDABench benchmark for evaluating LLM text-to-CUDA code generation with performance assessment metrics for GPU kernels.
Method for steering LLM behavior via representation manipulation that accounts for heterogeneous concept encoding across embedding spaces.
Theoretical analysis of length generalization bounds for transformers on CRASP language class, addressing model generalization guarantees.
Hypergraph neural network approach for predicting network controllability robustness against attacks, replacing computationally expensive simulations.
Label-guided distance scaling method for few-shot text classification, improving meta-learner effectiveness with selective label guidance.
PRISM foundation model for EEG diagnosis using masked autoencoders, ablated across pretraining populations and clinical adaptation domains.
Graph transformer framework (NETRA) for prioritizing disease genes in Alzheimer's networks using multimodal biological data.
Comparative analysis of UMAP with PCA, Kernel PCA, SIR variants, and t-SNE for dimensionality reduction across benchmarks.
Addressing catastrophic forgetting in class-incremental learning by analyzing temporal imbalance of positive/negative supervision signals.
Quantum-enhanced LoRA fine-tuning method for few-shot AI-generated content detection, combining quantum neural networks with low-rank adaptation.
Preconditioning techniques for flow matching and score-based diffusion to improve optimization by handling ill-conditioned covariance matrices.
Diffusion-based model predictive control with discrete denoising for game playing, tested on Tetris with feasibility constraints and critic alignment.
Joint inference of epidemic parameters and mobility networks in metapopulation models using encoder-decoder architecture.
Learning optimal threshold-based stopping rules for parking problems with unknown Poisson arrival processes via jump intensity estimation.
Proposes rigidity-aware geometric pretraining for protein design and conformational ensembles using global geometric representations.
Studies personalized multi-agent average reward TD learning with joint linear approximation, inspired by federated learning approaches.
Analyzes temperature parameter selection in knowledge distillation and its interaction with optimizer, pretraining, and finetuning choices.
Introduces loss-level spectral regularization using Fourier and wavelet-domain losses to improve diffusion model training without architecture changes.
Studies computational reducibility in neural solvers for graph combinatorial optimization, enabling model generalization across task distributions.
MUSE is an open-source platform for multimodal safety evaluation of LLMs with cross-modal payload generation and multi-turn attack algorithms.
Proves selection theorems showing that low average-case regret forces AI agents to develop internal world models or belief states for robust decision-making.
ParEVO uses LLM-based agentic evolution to synthesize parallel code for irregular data structures, addressing limitations of standard models on concurrent programming.
Studies thermodynamic regulation of finite-time Gibbs chain training in Restricted Boltzmann Machines, analyzing energy landscape evolution during learning.
Establishes theoretical connection between classifier-free guidance in diffusion models and Anderson acceleration via Hopfield dynamics.
Presents EdgeFLow, a federated learning framework using sequential model migration in edge networks to reduce communication bottlenecks in IoT systems.
Derives Wasserstein Proximal Policy Gradient using optimal transport geometry for continuous-action entropy-regularized RL without policy log-density evaluation.
Develops parameter-free temporal difference learning for RL that avoids requiring problem-dependent quantities like feature covariance eigenvalues.
Studies DNN partitioning and resource allocation for device-edge collaborative inference under jamming attacks on resource-constrained systems.
Introduces HACRL, a collaborative reinforcement learning paradigm where heterogeneous agents share verified rollouts during training but execute independently at inference.
Proposes diffusion actor-critic with flow matching for real-time autonomous driving policies, addressing inference latency in generative RL approaches.
Neural networks on financial time series show underspecification where different optimizers produce identical test loss but learn different functions in volatility forecasting.
Theoretical analysis of implicit regularization in Deep Linear Discriminant Analysis for metric learning objectives.
Multi-objective reinforcement learning method for extracting Pareto fronts of policies in continuous control tasks, addressing trade-offs between multiple objectives.
Bandit-based prompt optimization for multi-agent systems using graph neural networks to improve LLM-powered workflow performance without modifying workflows.
Heterogeneous analog-digital computing approach for efficient Mixture-of-Experts inference with theoretical generalization guarantees and hardware nonideality mitigation.
SaFeR-ToolKit formalizes multimodal safety as checkable protocol using virtual tool calling for vision-language models to prevent jailbreaks.
HomeAdam variant of Adam optimizer improves generalization bounds to match SGD convergence rates for deep learning model training.
SAGE method improves diffusion planners for offline RL by using latent consistency signals to penalize dynamically inconsistent plans at inference-time.
Two-Stage Causal-GRPO framework addresses shallow safety alignment in LLMs vulnerable to adversarial prefix attacks through semantic intent pinning.
Paradigm for causal structure learning from observational data leveraging human causal knowledge to address combinatorial explosion of possible graphs.
Unified framework addressing both missing and noisy modalities in multimodal learning to improve robustness on low-quality real-world data.
Empirical evaluation of uncertainty-based selective prediction reliability in multimodal clinical condition classification using ICU data.
Theoretical analysis of factorized gradient descent for low-tubal-rank tensor recovery from noisy linear measurements under t-product framework.
Training recipe enabling FP4 efficiency for large-scale Mixture-of-Experts models on Hopper GPUs without native 4-bit support.
BoGA framework combines evolutionary search with Bayesian optimization for protein sequence design and function prediction.