Endogenous Information in Routing Games: Memory-Constrained Equilibria, Recall Braess Paradoxes, and Memory Design
Game theory analysis of routing decisions with memory constraints and endogenous information recall using logit choice models.
Game theory analysis of routing decisions with memory constraints and endogenous information recall using logit choice models.
Multi-ORFT stabilizes online reinforcement fine-tuning for multi-agent diffusion models in cooperative autonomous driving scenarios.
Analysis of discourse diversity in multi-turn empathic dialogue, examining LLM formulaicity beyond single-turn settings.
Grounded world models for visuomotor planning using pretrained vision encoders, enabling semantic generalization without explicit goal images.
StarVLA-α simplifies Vision-Language-Action models for robotic agents by studying unified design choices across architectures and training data.
Efficient KernelSHAP explainability method for patch-based 3D medical image segmentation with reduced computational cost.
Benchmark for evaluating general reasoning capabilities of LLMs across diverse challenging tasks beyond domain-specific reasoning.
Full-stack infrastructure for training, evaluating, and deploying GUI agents with online RL and unified evaluation framework.
Runtime security framework protecting tool-augmented LLM agents against indirect prompt injection attacks through tool-returned content.
Mechanistic analysis of internal dynamics in looped reasoning language models versus standard feedforward models.
Benchmark dataset for detecting AI-generated Chinese text with evaluation across multiple LLM architectures.
Deep learning method for uncertainty quantification in clinical radiotherapy segmentation using budget-aware constraints.
RL approach for training physics reasoning models on simulators to address lack of large-scale QA datasets in physics domain.
Evaluation of LLM causal reasoning capabilities using real-world complex texts with implicit causal relationships.
Benchmark evaluating VLMs' strategic reasoning abilities in multi-agent environments with multimodal observations.
Three-stage pipeline for disambiguation-centric finetuning of enterprise tool-calling LLMs to reduce errors with near-duplicate tools.
Multi-agent LLM system for automated academic poster generation from papers incorporating design and aesthetic principles.
Benchmark and framework for evaluating LLM-driven persuasive dialogue for health behavior change in insulin delivery adoption.
GUI agent framework for multi-step e-commerce risk management handling stateful interactions with dynamic web content.
Interactive learning approach enabling LLMs to improve reasoning through multi-agent interactions during inference without re-execution.
Reward learning method deriving progress estimation signals from passive videos for robotics RL tasks without manual reward engineering.
RL method for improving reasoning in diffusion-based language models using denoising process rewards instead of outcome-only rewards.
Multi-agent LLM system for iterative narrative script refinement using divide-and-conquer approach to improve long-form creative content generation.
RL framework for e-commerce search relevance using stepwise reward optimization to improve LLM-based query-product matching beyond SFT/DPO limitations.
Graph-coarsening strategy for Capacitated Vehicle Routing Problem with time windows using multilevel aggregation and quantum/classical solvers for large-scale logistics optimization.
MGA memory-driven GUI agent reduces context overload and architectural redundancy by managing sequential trajectory history for improved long-horizon end-to-end automation.
Audits MedCalc-Bench clinical labels using physician-in-the-loop stewardship to assess reliability of LLM-synthesized reference labels in ML benchmarks.
PRISM framework disentangles SFT and RL training data via gradient concentration to diagnose learning needs and optimize data allocation for LLM agent training.
AgencyBench evaluates LLM-based autonomous agents on long-horizon real-world scenarios with 1M-token context windows, enabling scalable automated evaluation without human-in-the-loop.
Risk Awareness Injection method calibrates vision-language models against multimodal jailbreak attacks without fine-tuning or token manipulation, preserving model utility.
ANCHOR framework generates high-quality synthetic training data for GUI agents by trajectory expansion from seed demonstrations to create diverse, goal-consistent interaction data.
Constrained Assumption-Based Argumentation (CABA) extends ABA frameworks beyond propositional atoms to support variable-based arguments for structured argumentation.
AI agent system for pharmaceutical drug asset scouting across global non-English channels to identify novel drug development opportunities via multi-source intelligence.
FlexMS benchmark framework for evaluating deep learning mass spectrum prediction tools in metabolomics for drug discovery and molecular property identification.
Nano-EmoX proposes three-level cognitive hierarchy (perception, understanding, interaction) for unified multimodal emotional intelligence in language models with empathy capabilities.
Diagnostic framework for LLM agent memory systems comparing write strategies, retrieval methods, and utilization behavior to identify performance bottlenecks across memory components.
Analyzes whether AI systems fail similarly to humans using error alignment metrics on out-of-distribution data to assess cognitive similarity and decision-making strategies.
NormCoRe framework studies how norms emerge in multi-agent AI systems through deliberation and negotiation using replication-by-translation methodology for fairness-sensitive domains.
dTRPO algorithm reduces trajectory probability calculation costs for policy optimization of diffusion-based LLMs, enabling scaled offline RL training for preference alignment.
Method for tracking internal states of LLMs across conversations using self-report-inspired techniques for safety, interpretability, and model welfare without white-box compression.
Manifesto proposing Agentic Business Process Management (APM) framework extending BPM to govern autonomous agents executing organizational processes with agent-oriented abstractions.
Maximum entropy methods for generating synthetic populations matching multi-way constraints from aggregate statistics, applied to microsimulation and privacy-preserving data release.
Large-scale empirical study analyzing 2,000+ publications on reinforcement learning environments, proposing a taxonomy of RL environment evolution and technological trends.
Examines ethical front-end design choices in conversational AI systems, focusing on user interaction and representation rather than backend algorithmic issues.
AIRA_2 addresses three bottlenecks in AI research agents: synchronous GPU execution, generalization gaps, and fixed LLM operator limitations through improved architectural design.
AutoMS is a multi-agent neuro-symbolic framework using LLMs as semantic navigators for evolutionary search in inverse microstructure design, addressing topology optimization challenges.
CoEvoSkills framework enables LLM agents to self-evolve structured multi-file skill artifacts through co-evolutionary verification without manual authoring.
Deep RL framework optimizes land-use allocation in Lake Malawi Basin to maximize ecosystem service value with ecological constraints.
Position paper on failure modes in agentic IR systems, analyzing error cascades in multi-step reason-act-observe workflows despite linguistic fluency.
Hierarchical multi-agent RL framework for reconfigurable intelligent surfaces removes need for channel state information estimation.