Bootstrapping Task Spaces for Self-Improvement
Presents Exploratory Iteration (ExIt), RL methods enabling agents to self-improve through iterative refinement without fixed iteration limits.
Presents Exploratory Iteration (ExIt), RL methods enabling agents to self-improve through iterative refinement without fixed iteration limits.
Explores exogenous variable modeling in spatio-temporal forecasting systems to improve prediction accuracy.
Data augmentation strategies for generative recommendation systems improving generalization in sequential user behavior prediction.
Privacy-aware Bayesian network approach using credal sets for secure public release of probabilistic graphical models.
Multi-agent RL with curiosity-driven exploration using contextual calibration to distinguish novelty from environmental stochasticity.
DriftLite: Training-free particle-based approach for inference-time diffusion model adaptation to new distributions.
Error mitigation methods for post-training N:M activation sparsity in LLMs enabling dynamic input-adaptive compression.
Aurora: Multimodal foundation model for cross-domain time series forecasting integrating text and temporal data.
SpinGPT applies LLM approach to poker strategy, addressing CFR computational limits in multi-player game settings.
8-bit blockwise quantization of Muon optimizer states reducing memory overhead for large-scale LLM pretraining.
Framework for standardizing evaluation of positive-unlabeled learning algorithms under consistent experimental settings.
Weather forecasting method using adaptive boundary alignment for regional and global predictions with spatial-temporal modeling.
Polychromic objectives for RL fine-tuning preventing policy collapse and preserving diversity in pretrained model behaviors.
Diffusion Alignment as Variational EM framework addressing reward over-optimization and mode collapse in diffusion model alignment.
Analysis of RL-induced parameter dynamics in LLMs revealing rank-1 dominance in reasoning improvements and predictability of training trajectories.
Surrogate-free ADMM method for LLM pruning achieving >50% sparsity without accuracy degradation, breaking through conventional compression limits.
Scaling law formalization incorporating data quality parameter for language model pretraining, extending traditional model/dataset size relationships.
KVComm: Communication framework for multi-agent LLM systems using selective key-value sharing instead of natural language or hidden states.
Fairness auditing framework for classifiers with partial feedback using cost-aware data acquisition strategies.
TROLL: Trust region-based RL method improving upon PPO clipping for LLM fine-tuning, achieving more stable and optimal reward-based training.
Novel RL algorithm for diffusion LLMs using distribution matching policy optimization to improve reasoning capabilities and match autoregressive LLM performance.
Systematic interpretability study of five LLMs' medical knowledge using activation analysis and layer lesioning techniques.
Weak supervision model for healthcare fraud detection using knowledge-guided learning with limited labeled data.
Semantic search engine for Lean theorem prover mathlib using intent-aware ranking for theorem discovery.
Causal inference research on state-based identifiability of causal effects in treatment-outcome relationships.
RL algorithm using divide-and-conquer for offline goal-conditioned reinforcement learning value estimation.
Uses LLMs as core component for Bayesian network structure discovery from data, replacing traditional structure learning methods that require extensive observational data.
Method for adapting LLM agents to novel environments through test-time interaction, addressing syntactic and semantic mismatches in observation formats and state dynamics.
Leak@k: study showing existing LLM unlearning methods fail under probabilistic decoding despite success under greedy decoding evaluation.
Blind-IGT: inverse game theory method jointly decoding rewards and rationality in entropy-regularized competitive games with unknown rationality parameter.
Quasimetric learning method for goal-conditioned RL using multi-step returns to estimate temporal distance between observations over long horizons.
FlowCast: conditional flow matching method for radar-based precipitation nowcasting addressing uncertainty and high-dimensional data modeling.
Gradient estimation method for multi-objective and meta reinforcement learning, partitioning n objectives into k groups for language model preference optimization.
InTAct: continual learning approach using interval-based task activation consolidation with mathematical guarantees against catastrophic forgetting.
MIST: neural network-based mutual information estimator trained on 625K synthetic distributions with known ground-truth MI.
E2E-GRec framework for end-to-end joint training of GNNs and recommender systems, replacing two-stage pipeline approach.
SelfAI multi-agent system for self-directed long-horizon scientific discovery with human-in-the-loop workflows and exploration trade-offs.
ML-Tool-Bench framework for tool-augmented planning in autonomous ML agents orchestrating data analysis and model optimization workflows.
GRAPE framework unifying positional encoding mechanisms using group actions for multiplicative rotations and additive biases.
ECHO benchmark for evaluating graph neural networks on long-range graph propagation and interaction tasks.
Clustered personalized federated learning framework using Population Stability Index to handle non-IID data across clients.
Soft-gated fractional mixture-of-experts with randomized adversarial training to defend ML models against adversarial attacks.
RL framework for adaptive precision tuning in linear solvers using contextual bandit approach to balance precision and efficiency.
HeurekaBench benchmarking framework for evaluating LLM-based AI co-scientist agents on end-to-end scientific analysis tasks.
Mathematical framework for polyphonic music generation using structural inductive bias and smart embeddings on Beethoven sonatas.
Multitask learning framework with denoising autoencoder for EEG signal analysis combining motor imagery and emotion recognition.
Mixture-of-experts model with self-augmentation for Quality of Service prediction in web service recommendation systems.
Method for inverting Self-Organizing Maps as generative models using activation patterns and distance geometry to reconstruct inputs.
Physics-informed inverse modeling framework for Arctic snow depth prediction combining process-based constraints with data-driven learning.
Machine unlearning method addressing long-tailed distributions in forget sets using forgetting-aware loss reweighting for privacy compliance.