Primal-dual natural actor-critic algorithm for constrained MDPs with neural network critics and general policy parameterization, enabling high-dimensional continuous control.
Theoretical analysis of greedy sparse learning algorithms examining convergence failure with step-size decay in matching pursuit and boosting methods.
Reverse Distillation framework addressing poor scaling in protein language models by decomposing large model representations using smaller model guidance.
FedShift: distributed adversarial attack on federated graph learning systems with two-stage hide-and-find approach for model poisoning.
GANRA: GPU-accelerated SMT solver combining LLMs and gradient descent for solving quantifier-free nonlinear real arithmetic problems.
MicroCoder-GRPO: improved training approach for code generation models using Group Relative Policy Optimization with conditional truncation masking for handling longer outputs.
ProgAgent: continual reinforcement learning agent using progress-aware reward learning from unlabeled expert videos, addresses catastrophic forgetting in robotic learning with JAX architecture.
arXiv paper investigating loss of plasticity in Vision Transformers for continual learning, examining why attention-based models struggle to adapt to new tasks over time.
Deep learning approach for multi-user MIMO wireless precoding using complex projective space parameterization of neural network outputs.
Temporal-difference reinforcement learning algorithm that incorporates gradients of bootstrapped estimates to improve stability over semi-gradient approaches.
Gradient-free guidance method for diffusion models in Bayesian inverse problems avoiding computationally expensive vector-Jacobian products.
Particle filtering analysis of inference-time aggregation and pruning methods for steering LLMs using process reward models to optimize accuracy-cost tradeoffs.
Decision-theory framework for designing probabilistic weather forecasts tailored to heterogeneous farmer decision-making contexts.
LLM-driven feature engineering pipeline for predicting job execution times in Databricks cloud systems to optimize cost allocation.
Bayesian Transformer framework for probabilistic power grid load forecasting with uncertainty quantification under distributional shifts.
Quantization technique for Vision-Language-Action models that adapts precision dynamically across inference stages to reduce computational overhead for edge deployment.
ELLMob generates human trajectories during large-scale events using LLM framework with event-annotated mobility datasets capturing deviations from routine patterns.
PSTNet estimates atmospheric turbulence intensity using physics-structured ML models respecting conservation laws for real-time aircraft safety applications.
$OneMillion-Bench evaluates language agents on 400 expert-curated real-world tasks across Law, Finance, Healthcare, Industry, and Science requiring multi-step reasoning and tool use.
MJ1 is a multimodal judge trained with RL to enforce visual grounding through structured verification chains and counterfactual consistency rewards.
Amortized MIPS uses neural networks to predict maximum inner product search solutions, reducing computational cost for fixed query and key distributions.
FedMomentum preserves optimization momentum during federated LoRA fine-tuning of LLMs through noise-free aggregation maintaining structural expressiveness.
Compute-efficient pipeline for data mixture scaling in LLM training, enabling extrapolation to large models without costly searches on target models.
Stabilized LoRA fine-tuning for federated LLM training using scaling factors to mitigate client heterogeneity effects and aggregation instability in distributed settings.
Adversarial domain adaptation for RNA-seq phenotype prediction addressing data scarcity through knowledge transfer between heterogeneous transcriptomic datasets.
Deterministic differentiable structured pruning method for LLMs using l0 sparsity constraints, eliminating train-test mismatch from stochastic relaxations in prior work.
Explores autoregressive tiny recursive models for general prediction tasks, extending TRM mechanism beyond ARC-AGI to support iterative refinement in diverse domains.
EAGLE-Pangu implements tree speculative decoding for LLM acceleration on Ascend NPUs, optimizing inference speed through multi-token verification with hardware compatibility.
Demonstrates safety vulnerability in LLMs where steganographic fine-tuning allows models to maintain safety facade while covertly generating harmful content through hidden instructions.
Model-based offline RL method using adversarial model learning with adaptive weighting to mitigate model exploitation in policy exploration from limited offline data.
Probabilistic anomaly detection methodology for condition monitoring of helicopter transmissions using Bayesian approach trained only on healthy operational data.
SAGAD addresses graph anomaly detection with scalable GNN-based approach handling homophily disparity and computational efficiency challenges in node classification.
DARC proposes an inference-time method for aligning LLMs with heterogeneous human preferences by framing response selection as a risk-constrained decoding problem, avoiding retraining.
JAX-based framework for training spiking neural networks with exact gradients via differentiable ODE solving, enabling flexible neuron models.
Theoretical analysis of classifier-free guidance in diffusion models with adaptive score discrepancy-based control for better conditional generation.
Critique of evaluation practices in long-term time series forecasting, questioning reliance on pointwise error metrics for progress assessment.
Taxonomy-informed representation learning for text-rich networks, leveraging hierarchical knowledge structures for better semantic understanding.
AutoAdapt automated framework for domain adaptation in LLMs, handling hyperparameter selection and evolving knowledge without manual tuning.
SERQ: post-training quantization method for LLMs using saliency-aware low-rank error reconstruction for efficient deployment.
Operations research framework for sequential geographic service network expansion under capacity constraints and demand uncertainty.
Distributional regression using TabPFN and TabICL foundation models for tabular data with probabilistic scoring evaluation.
Evaluation of distance metrics for staleness measurement in asynchronous federated learning aggregation methods.
Wiener Chaos Expansion-based neural operator using FiLM for solving singular stochastic partial differential equations.
Fibration Policy Optimization introduces APC-Obj for training heterogeneous LLM systems with multi-scale hierarchical stability control.
Neural forecasting approach for predicting patient physiology to optimize antibiotic therapy transitions and reduce hospital stays.
FedPrism framework for federated learning with non-IID data, using adaptive personalization strategies under statistical heterogeneity.
Neural network-augmented calibration system for airborne magnetic anomaly navigation without extensive offline pre-training.
SCL-GNN addresses spurious correlations in graph neural networks to improve generalization across diverse graph tasks.
Physics-informed ML method (PolyFormer) for solving constrained optimization problems using transformer architecture and geometric knowledge.
Theoretical analysis of implicit bias in Sharpness-Aware Minimization showing depth-dependent behavior in linear networks diverges from gradient descent.