SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents
SEARL framework enables self-evolving agents through joint optimization of policy and tool graph memory, reducing reliance on large-scale LLMs.
SEARL framework enables self-evolving agents through joint optimization of policy and tool graph memory, reducing reliance on large-scale LLMs.
HiL-Bench evaluates whether coding agents know when to request help with incomplete specifications, exposing judgment gaps in frontier models.
Empirical study of how LLM agents coordinate in multi-agent games, distinguishing baseline action similarity from strategic algorithmic monoculture.
AXIL derives exact instance attribution method for gradient boosting machines, expressing predictions as weighted sums of training targets.
Proposes contrastive learning method for dialogue sentence embeddings using token-level template annotations instead of utterance-level labels.
SciTune framework aligns LLMs with scientific domain knowledge through instruction fine-tuning on multimodal scientific publication data.
MM-LIMA demonstrates multimodal LLM fine-tuning achieves strong results with only 200 high-quality instruction examples, reducing data requirements.
Proposes CROP, a model-based offline reinforcement learning method addressing distribution shift through conservative reward estimation.
Uses predictive coding theory to decode and reconstruct language from fMRI brain signals, advancing neuroscience understanding of speech perception.
Framework using LLMs with philosophical relevance concepts to improve utility-based result ranking in retrieval-augmented generation systems.
Linear attention-based deep learning approach for multiplicative noise removal in radar and medical images.
Sample-efficient offline reinforcement learning for aircraft control using symmetric data augmentation to exploit system symmetries.
MegaFake dataset of LLM-generated fake news for studying mechanisms of misinformation generation and detection methods.
Deep Optimizer States method enables scalable training of transformer models using interleaved offloading to overcome memory constraints.
Framework for synthesizing realistic PCIe transaction layer packet traces using constrained generative AI for device development.
PoTable framework improves table reasoning in LLMs using plan-then-execute reasoning stages for systematic thinking.
WebLLM inference engine enabling high-performance LLM execution directly in web browsers for on-device deployment without server GPUs.
HumanVBench benchmark for evaluating human-centric video understanding in multimodal large language models with 16 fine-grained tasks.
Privacy-preserving federated framework for survival analysis using threshold homomorphic encryption across multiple institutions.
Three human studies examining whether humans can be influenced to conform to preference models used in RLHF algorithms for LLMs.
Novel curriculum learning approach for sample-efficient reinforcement learning applied to quadrotor stabilization control.
Combines semi-supervised and active learning for semantic segmentation to reduce manual annotation costs and improve model performance.
Proposes using LLMs to help mitigate barren plateaus in quantum neural network training through adaptive parameter initialization.
ExPath framework uses graph learning to infer biological pathways in knowledge bases, integrating experimental data for classification.
First robotic system using learning-based approaches for real-world piano playing, advancing manipulation capabilities in robotics.
AccidentSim framework generates physically realistic vehicle collision videos for autonomous driving research using real accident reports.
Study evaluating emergent lifelong learning behaviors in LLMs during multi-turn interactions, proposing new evaluation benchmarks for character-like consistency.
TARAC method addresses hallucinations in vision-language models by improving temporal attention mechanisms during generation without extensive retraining.
Research on energy-efficient optimization techniques for LLM deployment, including quantization and local inference strategies to reduce carbon emissions.
PODS decouples rollout generation from policy updates in LLM RL, addressing compute asymmetry through down-sampling.
LOOPE method learns optimal patch ordering in Vision Transformer positional embeddings for improved spatial information encoding.
Non-stationary diffusion model for time series forecasting using Location-Scale Noise Model for variable uncertainty.
RL^V framework unifies LLM reasoners with verifiers using value functions for improved test-time compute scaling during reasoning.
Auto-Regressive Transformation method for image alignment handling feature-sparse regions and large deformations.
Bayesian approach for Vision Language Models to reduce hallucinations and overconfidence in VQA through selective prediction.
TokUR enables LLMs to self-assess uncertainty at token-level for improved reasoning and response reliability in multi-step tasks.
Sat2Sound framework predicts soundscape distribution using satellite images and vision-language models for geospatial audio understanding.
SpatialScore: comprehensive benchmark for evaluating spatial intelligence of multimodal LLMs with data-driven and agent-based assessment approaches.
GoT-R1: reinforcement learning framework enhancing multimodal LLM reasoning for complex visual generation with precise spatial relationships and attributes.
Fine-tuning approach for LLMs to predict diverse user behaviors, addressing overfitting to frequent behaviors while capturing long-tailed behavior distribution.
World models for interactive video generation with action conditioning and autoregressive decoding to support planning and future prediction.
Progressive multimodal network for quantifying fish feeding intensity in aquaculture using sensor fusion and conflict resolution between modalities.
Framework using LLMs for few-shot code generation to create safety-critical driving scenarios in CARLA simulator for autonomous driving evaluation.
Mathematical analysis of coarse-grained arithmetic applied to the St. Petersburg paradox in decision theory.
LLM-based autonomous agent for power system voltage control, using experience-driven learning to generate dispatch strategies in distribution networks.
Data Mixing Agent: LLM-based method to automatically re-weight training data domains during continual pre-training, preventing catastrophic forgetting.
PRIX: efficient end-to-end autonomous driving model planning from raw camera pixels without LiDAR, reducing model size and computational requirements.
MDM-OC: framework for scalable, reversible model composition enabling continual learning without task interference or catastrophic forgetting.
Genetic programming approach for symbolic distillation of neural networks, using teacher-student smoothness alignment to improve explainable AI model accuracy.
Protocol for reliable evaluation of low-precision retrieval systems, addressing spurious ties and variability in relevance scoring with reduced numerical precision.