HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation
Training method for LLMs on mathematical reasoning combining RL with privileged self-distillation to improve learning on hard problems.
Training method for LLMs on mathematical reasoning combining RL with privileged self-distillation to improve learning on hard problems.
C++ implementation of neural network verification tool supporting bound propagation methods for DNN formal analysis.
Safe reinforcement learning method addressing constraint violations in off-policy exploration through constrained optimistic exploration Q-learning.
ML method for deep-sea microbial analysis with small datasets using knowledge enhancement techniques.
Online multi-robot task assignment and route scheduling in smart factories using wireless communication under partial observability.
DIET: structured pruning method for LLMs using dimension-wise global importance scores that adapt to task-specific requirements.
Uses LLMs to generate portable patient embeddings from clinical time series that transfer across hospitals with minimal retraining.
Analysis of design challenges in iterative generative optimization using LLMs for self-improving agents; identifies hidden choices engineers must make.
Dimension-free zeroth-order estimator for PINNs addressing spatial derivative complexity and memory overhead in high-dimensional PDEs.
Iterative unsupervised framework for feature selection and clustering in high-dimensional data by recovering influential features.
Generative framework using Lagrangian relaxation-guided score-based generation to solve mixed-integer linear programming with diverse solutions.
MoE-Sieve: routing-guided LoRA fine-tuning framework for MoE models that adapts to skewed expert routing patterns for efficiency.
Investigates optimal sensor placement for GNN-based leakage detection in water distribution networks.
Dual guidance approach for RL-based LLM training combining external verification and internal experience to improve reasoning task performance.
Graph representation learning for analog circuit electrical equivalence to support electronic design automation tasks.
Causal inference framework for learning disentangled representations from multiplex graphs by separating shared and layer-specific information.
RLHF-aligned LLMs exhibit response homogenization limiting uncertainty estimation; analyzes alignment tax impact across different tasks and sampling methods.
Gossip-based distributed machine learning algorithms for IoT networks with privacy constraints and limited computation/communication resources.
Graph convolutional networks using reservoir computing to address challenges with complex and dynamic graph data and long-range dependencies.
Bayesian optimization framework for tuning control policies using human preferences and pairwise comparisons instead of quantitative evaluations.
Neural operator learning method combining linear and nonlinear effects for efficient PDE solving without repeated solution computation.
FPGA-based implementation of weightless neural networks using Tsetlin automata for on-chip training and inference with low latency and complexity.
Scalable RL pipeline for improving LLM code generation through synthetic data and curriculum learning, addressing data diversity challenges at scale.
Transformer architecture for multivariate time series forecasting using multi-resolution representations to capture short-term and long-range dependencies.
Study on privacy vulnerabilities in deep learning time series imputation models, demonstrating membership inference attacks in black-box settings.
Nonnegative matrix factorization approach using maximum-volume basis vectors for identifying NMF solutions in highly mixed data.
Methods for assessing adversarial attack vulnerability and augmenting identity recognition models trained on small LiDAR skeleton datasets.
Framework for probabilistic time series forecasting that explicitly models heteroscedasticity and time-varying conditional variances in nonstationary dynamics.
ReGuider representation-level supervision method improves time series forecasting by capturing extreme patterns and salient dynamics in temporal representations.
DeepDTF dual-branch transformer framework predicts cancer drug response from multi-omics data, addressing cross-modal alignment in precision oncology.
Vision-language model approach for image clustering using LLM-generated text features with adaptive semantic centers to improve inter-class discriminability.
Cost-Sensitive Neighborhood Aggregation (CSNA) GNN layer uses per-edge routing to handle heterophilous graph structures differently based on adversarial vs informative regimes.
Framework uses LLMs to automatically design reward functions for cooperative multi-agent reinforcement learning, synthesizing executable reward programs from environment instrumentation.
Multi-agent reinforcement learning approach for decentralized adaptive traffic signal control using learned coordination in partially observable environments.
MolEvolve framework uses LLM guidance with evolutionary search for interpretable molecular optimization, addressing activity cliffs and lack of interpretability.
LSTM functional models learn nonlinear mappings from wave-vessel time series to predict parametric roll episodes and statistical shifts in ship responses.
CUA-Suite dataset provides massive human-annotated continuous video demonstrations for training computer-use agents on desktop automation tasks, addressing data bottleneck.
Transfer learning framework using LSTM and conformal prediction for lithium-ion battery state-of-health forecasting across manufacturing variations.
Theoretical work on uniform laws of large numbers in product spaces extending VC dimension theory under product distribution assumptions.
Sequential-AMPC uses recurrent neural networks to approximate nonlinear model predictive control offline, reducing online computation for embedded hardware control systems.
AI agents using Claude Code autonomously discovered novel adversarial attack algorithms for LLMs that outperform 30+ existing methods in jailbreaking and prompt injection attacks.
Agentic Variation Operators replace fixed mutation/crossover in evolutionary search with autonomous coding agents consulting lineage and domain knowledge.
TuneShift-KD enables knowledge distillation and transfer of fine-tuned specialized knowledge to newer LLM architectures without access to original training data.
Multi-dimensional evaluation framework for uncertainty attribution methods in explainable AI addressing inconsistent evaluation across heterogeneous proxy tasks.
UI-Voyager is a self-evolving mobile GUI agent using rejection fine-tuning and credit assignment to learn from failed trajectories in long-horizon tasks.
RAVEN applies generative pretraining to structured electronic health records using recurrence-aware next-visit event prediction on 1M+ patient dataset.
DreamerAD enables efficient RL for autonomous driving via latent world model achieving 80x speedup by compressing diffusion sampling from 100 to 1 step.
Multilevel Euler-Maruyama method accelerates diffusion model solving via multi-level approximators with polynomial speedup in HTMC regime.
KARMA applies LLMs to personalized search at Taobao by addressing knowledge-action gap through regularized multimodal alignment for next-item prediction.
Deletion-Insertion Diffusion language models replace masking paradigm with discrete diffusion processes for improved computational efficiency and generation flexibility.