Learning to Play Blackjack: A Curriculum Learning Perspective
LLM-based curriculum learning framework for reinforcement learning agents applied to Blackjack game strategy.
LLM-based curriculum learning framework for reinforcement learning agents applied to Blackjack game strategy.
Sparse interpretable machine learning models for improving branching decisions in mixed-integer programming solvers without GPU requirements.
Open-source adaptive router for multi-model LLM serving using cost-aware contextual bandits with non-stationary pricing and quality changes.
Epileptic seizure detection from EEG signals using graph convolutional neural networks on frequency band features.
Sit-to-stand transition detection using smart lacelock sensor for fall risk assessment in older adults.
Normalizing flow models using Lévy process distributions for heavy-tailed financial risk modeling.
Offline reinforcement learning from human feedback with multiple preference oracles for trading off performance with safety and fairness constraints.
Unsupervised neural network for 4D Flow MRI velocity field enhancement and phase wrapping correction using divergence-free parameterization.
Physical reservoir computing using Lead Zirconate Titanate for digit classification.
Alignment metrics for comparing neural network representations operating in superposition.
Diversity-aware reverse KL divergence method improving LLM distillation with large capacity mismatches.
Analysis of neural collapse dynamics identifying critical feature norm threshold for convergence.
MAC-Attention acceleration technique for LLM long-context decoding that preserves attention computation fidelity without compression.
Hierarchical flow matching framework for computationally efficient graph generation with reduced complexity.
Knowledge-Data ML framework integrating numeric data with knowledge for model construction.
Apprenticeship learning from imperfect demonstrations with evolving rewards in e-learning contexts.
Research on shuffling strategies for stochastic gradient descent optimization with convergence analysis.
Reinforcement learning framework for autonomous solver selection in chemical kinetics integration.
Agent system using RL to select optimal deep generative models for tabular data synthesis.
Generative framework for subsurface velocity model synthesis using proxy posterior estimation.
Conditional decoding strategy (CASA) for improving safety alignment in multimodal LLMs against cross-modal attacks.
AI safety research on vulnerabilities in autonomous agents with filesystem/email access via circuit analysis.
XGBoost model for startup founder success prediction using engineered features from career data.
Method for encoding graph structure into LLMs via graph pooling tokens for Graph Question Answering tasks.
Deep learning surrogate optimization for production control in stress-sensitive oil reservoirs.
Reinforcement learning approach for behavioral support in Type 1 Diabetes management and insulin dosing.
Gradient-based data valuation for curriculum learning in game-theoretic motion planning using TracIn scoring.
Study showing deep networks assign higher density to simpler out-of-distribution data than in-distribution test data.
Tuning-free GNN prompting framework for cross-graph adaptation without task-specific parameter updates.
Membership inference attack on LLMs via gradient-induced feature drift to detect training data exposure.
Distributed optimization algorithm for Byzantine-resilient gradient tracking with probabilistic edge dropout.
Lagrangian Descriptors framework for evaluating neural network models of Hamiltonian dynamics.
Dimension reduction research exploring multiple valid embeddings for high-dimensional data visualization.
Research on scheduling LLM inference using uncertainty-aware output length predictions instead of point estimates.
arXiv: Generalization bounds for overparameterized shallow neural networks using initialization-dependent distance norms.
arXiv: Decoupled basis-vector-driven generative framework for dynamic multi-objective optimization addressing irregular mutations and cold-start.
arXiv: MOON3.0 multimodal representation learning framework for fine-grained e-commerce product understanding using reasoning-aware embeddings.
arXiv: First algorithm for Lipschitz dueling bandits over continuous action spaces using adaptive reference arms.
arXiv: Multi-format quantization-aware training enables single model robustness across multiple numeric precisions for elastic inference.
arXiv: Multi-task representation learning in linear bandits with shared latent representations for knowledge transfer.
arXiv: Test-time adaptation for LLMs under continual distribution shift and open-set tasks, preserving source knowledge.
arXiv: HabitatAgent multi-agent LLM system for housing consultation with transparent reasoning and factuality guarantees.
arXiv: Study on how representation choice affects interpretation of protein conformational dynamics from molecular dynamics simulations.
arXiv: Sparse Identification Graph Neural Network for discovering interpretable governing equations in ultra-large complex systems.
arXiv survey: On-policy distillation transfers reasoning from frontier LLMs to smaller models, addressing exposure bias in knowledge distillation.
arXiv: Prompt-based online continual learning for next activity prediction in dynamic processes using catastrophic forgetting mitigation.
arXiv: Variational Neural Stochastic Differential Equations model complex socioeconomic time-series data with heterogeneous dynamics.
arXiv: Full-gradient successor feature representations improve convergence guarantees for transfer learning in RL with non-linear function approximation.
arXiv: Empirical comparison of neural operator surrogates including Fourier neural operators vs polynomial methods for parametric PDEs.
arXiv: Group Relative Policy Optimization for RL addresses advantage collapse in reinforcement learning with verifiable rewards using hints.