Provably Efficient Algorithms for S- and Non-Rectangular Robust MDPs with General Parameterization
Provably efficient algorithms for robust MDPs with general policy parameterization, reducing to entropy-regularized formulations.
Provably efficient algorithms for robust MDPs with general policy parameterization, reducing to entropy-regularized formulations.
Mathematical framework explaining LLM generalization through sparse low-dimensional manifold geometry of activation states rather than parameter count.
Efficient steering method for unconditional diffusion models enabling controllable generation without gradient guidance or retraining.
Investigates whether a single learned representation can optimize multiple reward functions in RL, studying representation learning for multi-objective policies.
CADET applies decoder-only transformer architecture to CTR prediction in ads systems, addressing challenges of contextual post-scoring constraints.
TimeSynth framework for evaluating time series forecasting models and uncovering systematic biases in benchmarking nonlinear vs linear approaches.
Interpretable image classification using hierarchical concept embeddings derived from vision-language models for sparse concept recovery.
Deep learning approach for classifying lower back movements from motion tape sensor data to support remote physical therapy monitoring.
Novel sparse recovery method for Poisson inverse problems using Bregman proximity operators and NoLips algorithm.
Research on Generative Flow Networks: methods for biasing GFlowNets toward high-reward solutions in combinatorial optimization.
Partial GFlowNet accelerates convergence in large state spaces via planner-based partitioning, improving generative flow network scalability.
Framework for fair consensus clustering in multi-agent streaming environments under proportionate fairness constraints.
Calibrates auxiliary predictors for multinomial logit models with missing consumer choice data in market estimation.
RooflineBench framework benchmarks small language models on edge hardware using roofline analysis for performance characterization across architectures.
Unified framework addressing reward hacking and optimization instability in RLHF by combining KL regularization and policy ratio clipping.
Adaptive Milestone Reward addresses temporal credit assignment in RL-trained GUI agents by balancing outcome and process reward with adaptive thresholds.
PASCAL phase-aware scheduling algorithm optimizes serving of reasoning-based LLMs by distinguishing reasoning and answering phases for improved latency.
AltTS dual-path framework with alternating optimization separates autoregressive dynamics from cross-dimension interactions in multivariate time series forecasting.
Krause Attention mechanism prevents representation collapse in transformers by decoupling softmax normalization inspired by bounded-confidence dynamics.
Proactive anomaly detection using forward and backward forecast modeling for early warning signals in industrial, financial, cybersecurity applications.
Native Reasoning Models trains language models to reason without external verifiers or fully-annotated data, extending RLVR paradigm to unverifiable domains.
TS-Memory adds plug-and-play memory module to time series foundation models for efficient adaptation under distribution shift without catastrophic forgetting.
Studies implicit bias of mini-batch stochastic steepest descent in multiclass classification under various norm geometries.
Benchmark evaluating foundation models pretrained on brain electrical signals for EEG and intracranial recordings in neuroscience.
Analyzes gradient compression effects on loss landscapes in federated learning, proposing sharpness-aware minimization remedy for generalization.
Framework enabling masked diffusion models to perform token correction after unmasking, reducing error accumulation in parallel generation.
SkillRater decomposes data quality into multidimensional capabilities rather than single scores, improving data curation for model training.
Evaluates transfer learning performance of large-scale chemical language models across downstream molecular property prediction tasks.
TreeGrad-Ranker uses probabilistic values to rank features in decision trees for local prediction explanation via efficient gradient computation.
ArGEnT transformer learns solution operators for physical systems with complex geometries, enabling surrogate modeling across varying parametric settings.
Graph prompt learning framework adapting pre-trained GNNs across domains via adaptive fusion and cross-domain knowledge transfer.
Defense against gradient inversion attacks in federated learning using targeted interpretable perturbation preserving model utility.
LLM-guided out-of-distribution detection for text-attributed graphs combining topology and text features for unseen patterns.
First-order algorithms for online bilevel optimization eliminating need for Hessian-vector product computation.
ML-based system detecting knee injuries in runners using optical motion capture and explainable machine learning.
Benchmark of complex deep research tasks across 10 domains from real-world usage patterns, measuring accuracy, completeness, objectivity.
Framework weighting training samples by quality factors including gradient consistency and verification status for specialized expert data.
Recursive Transformer architecture learning hierarchical dependencies through multi-resolution recursion with shared layers.
Framework for tabular prediction using explicit in-table evidence selection, making row context auditable and interpretable.
Physics-based state estimation method using potential-energy gating for robust filtering in bistable stochastic systems.
Diffusion LLM approach for CUDA kernel code generation leveraging parallel token generation and non-sequential refinement.
Parameter-efficient fine-tuning method viewing adaptation as neuromodulation-inspired mode selection and rescaling of pretrained computations.
U-Former ODE architecture for fast probabilistic forecasting of irregularly sampled time series using neural differential equations.
ML framework for reliable network traffic forecasting using deep learning adapted for traffic characteristics.
Multi-tenant model serving system handling seamless model updates with dynamic decision threshold management.
Framework treating temperature as adaptive meta-policy in LLM reinforcement learning to improve exploration-exploitation tradeoff.
Method for fair classification without demographic information using spectral uncertainty sets.
Benchmark for evaluating LLM safety under repeated inference via prompt stress testing, addressing consistency failures in deployment.
Method for learning stochastic partial differential equations from spatiotemporal observations using latent-variable formulation and deep learning.
Enhances off-policy RL sample efficiency by constraining initial representations to address distribution shift and stabilize training.