Interactionless Inverse Reinforcement Learning: A Data-Centric Framework for Durable Alignment
Framework for learning inspectable alignment through inverse RL without direct policy modification, improving reusability and transparency.
Framework for learning inspectable alignment through inverse RL without direct policy modification, improving reusability and transparency.
Soft advantage policy optimization using smooth gate functions instead of hard clipping for stable LLM training and reasoning.
Comprehensive benchmark comparing state space models, transformers, and recurrent networks for US power grid electricity demand forecasting.
Continual learning architecture for LLMs preventing catastrophic forgetting during sequential updates using thalamically routed cortical columns.
Offline reinforcement learning with parametric policies under general function approximation beyond state-wise mirror descent.
Federated learning algorithm addressing statistical heterogeneity and non-IID data with proximal-balanced scaling for privacy-preserving training.
Sample-efficient hypergradient estimation for decentralized bi-level reinforcement learning in strategic decision-making and environment design.
Masked discrete diffusion model with self-aware Markov transition kernels enabling adaptive reasoning and error correction in discrete tasks.
Stable end-to-end joint embedding predictive architecture learning world models from raw pixels without representation collapse.
Multi-scale convolutional architectures for time series classification using diverse input representations and multi-representation learning.
Theoretical framework for population-based neural network training combining fast within-model optimization with slower population-level adaptation.
Multi-task supervised fine-tuning algorithm addressing heterogeneous overfitting across dataset mixtures with overfitting-aware data allocation.
Precipitation nowcasting model combining radar observations with weather foundation model priors to improve long-lead forecasting accuracy.
Analysis of systematic biases in Chinchilla scaling law fitting method applied to LLM training, showing parameter allocation errors in compute-optimal estimates.
Cloud-edge collaborative system for photovoltaic power forecasting using large models with latency constraints and robustness to weather distribution shifts.
Method for routing prompts to optimal LLMs/generative models using diversity-aware adaptive selection beyond fidelity scores.
Survey on enterprise financial risk prediction using big data and LLMs, covering AI/computer science approaches to finance and management risk analysis.
Self-supervised deep learning system for cardiac MRI analysis. Vision model trained via contrastive learning from visual concepts and text descriptions.
Dynamic pruning method to accelerate matrix factorization for recommendation systems. Reduces computational complexity in collaborative filtering with large user/item bases.
Theoretical study of feature learning in Leaky ResNets via Hamiltonian mechanics. Analyzes representation geodesics and bottleneck structures in infinite-depth limits.
Set2Seq Transformer for temporal multiple-instance learning with permutation-invariant set representations. Models internal structure and temporal relationships across timesteps.
Instance-level reasoning for generalized referring segmentation. Reformulates GRES to predict instance-aware masks with phrase-to-visual correspondence.
Coded computing schemes for distributed systems with probabilistic stragglers. Extends exact computation frameworks to handle approximate recovery scenarios.
Causal framework for evaluating LLMs controlling for randomization in token generation. Proposes coupled generation model for fair model comparison and ranking.
Framework integrating ML prediction uncertainty into online algorithm design. Uses calibration to leverage prediction-level confidence in algorithms with predictions.
Multi-agent optimization for UAV-assisted LoRa IoT gateways. Addresses energy efficiency in next-generation IoT networks.
Neural transport methods to accelerate Parallel Tempering MCMC sampling. Improves sample efficiency on high-dimensional and multimodal distributions.
Model-free RL framework for human motion imitation with musculoskeletal constraints. Improves on torque-controlled humanoids by modeling biomechanical realism.
Gen-C: Generative framework for simulating high-level crowd behaviors in virtual environments. Captures agent-agent and agent-environment interactions over time.
VidhikDastaavej: Model-agnostic wrapper for automated legal document generation in Indian context. Introduces large-scale anonymized dataset for long-form legal drafting.
3D Gaussian Splatting technique for wideband RF signal modeling across multiple frequency bands. Extends single-frequency 3DGS to handle diverse RF environments.
RNN-based control system design using linear matrix inequalities for output-feedback and state-feedback. Applies incremental ISS stability for robust tracking.
Theoretical analysis of generalization in one-hidden-layer neural networks using teacher-student framework. Provides complete characterization for generic activation functions.
Unified agent framework (NaviMaster) handling both GUI navigation and embodied navigation tasks via MDP formulation. First model to combine disparate domains with shared training paradigm.
Attention-based ML model predicting cloud performance under unknown workload in multi-tenant environments. Addresses resource contention in virtualized infrastructure.
Flow-matching model for 3D ligand generation and binding affinity prediction in drug discovery. SE(3)-equivariant architecture with multi-endpoint prediction capabilities.
Declarative OS interfaces for computer-use agents to replace GUIs, enabling LLMs to execute high-level goals with fewer API calls and less decomposition.
SyTTA: Label-free test-time adaptation for LLMs in specialized domains using only 4 extra tokens to mitigate distribution shifts.
Method for co-evolving test sets and prompts to refine LLM behavior, enabling iterative refinement of domain-specific policies without manual tuning.
Information pricing problem for selling high-dimensional proprietary data with decision-making buyers and monopolistic sellers.
Study of reliability breakdown prediction in 5G railway networks using CNN, LSTM, XGBoost, and transformer-based time series models.
Universal denoising method for signal recovery when noise distribution is unknown, using distributional shrinkage beyond Tweedie's formula.
Theoretical analysis of deep neural networks as convex computation paradigm, examining how DNNs implement Occam's razor through circuit size minimization.
Vision-Language-Action models for robotic manipulation using Tweedie discrete diffusion to improve generalization and action control.
Frame selection method for long-form video understanding with Large Multimodal Models, reducing computational cost of processing dense video tokens.
Collaborative causal sensemaking framework for LLM-based decision support agents enabling human-AI partnerships in expert settings.
Generative Adversarial Reasoner: adversarial reinforcement learning framework improving LLM reasoning and reducing calculation errors.
LatentNN addresses attenuation bias in neural networks through latent variable treatment for improved extreme value estimation.
Machine learning framework for image-caption rating using comparative judgments instead of direct rating annotations.
ShapBPT computes pixel-level feature attributions using hierarchical Shapley values with multiscale image structure.