Ax Hehai Lin, Shilei Cao, Sudong Wang, Haotian Wu, Minzhi Li, Linyi Yang, Juepeng Zheng, Chengwei Qin 8d ago

Interactive Learning for LLM Reasoning

Interactive learning approach enabling LLMs to improve reasoning through multi-agent interactions during inference without re-execution.

Ax Weihua Cheng, Junming Liu, Yifei Sun, Botian Shi, W Yirong Chen, Ding Wang 8d ago

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

MGA memory-driven GUI agent reduces context overload and architectural redundancy by managing sequential trajectory history for improved long-horizon end-to-end automation.

Ax Jinbiao Wei, Yilun Zhao, Kangqi Ni, Arman Cohan 8d ago

ANCHOR: Branch-Point Data Generation for GUI Agents

ANCHOR framework generates high-quality synthetic training data for GUI agents by trajectory expansion from seed demonstrations to create diverse, goal-consistent interaction data.

Ax Emanuele De Angelis (CNR-IASI, Rome, Italy), Fabio Fioravanti (DEc, University 'G. d'Annunzio', Chieti-Pescara, Italy), Maria Chiara Meo (DEc, University 'G. d'Annunzio', Chieti-Pescara, Italy), Alberto Pettorossi (DICII, University of Rome 'Tor Vergata', Italy), Maurizio Proietti (CNR-IASI, Rome, Italy), Francesca Toni (Imperial, London, UK) 8d ago

Constrained Assumption-Based Argumentation Frameworks

Constrained Assumption-Based Argumentation (CABA) extends ABA frameworks beyond propositional atoms to support variable-based arguments for structured argumentation.

Ax Wenxuan Zhang, Lemeng Wu, Changsheng Zhao, Ernie Chang, Mingchen Zhuge, Zechun Liu, Andy Su, Hanxian Huang, Jun Chen, Chong Zhou, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Wei Wen 8d ago

dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models

dTRPO algorithm reduces trajectory probability calculation costs for policy optimization of diffusion-based LLMs, enabling scaled offline RL training for preference alignment.

Ax Diego Calvanese, Angelo Casciani, Giuseppe De Giacomo, Marlon Dumas, Fabiana Fournier, Timotheus Kampik, Emanuele La Malfa, Lior Limonad, Andrea Marrella, Andreas Metzger, Marco Montali, Daniel Amyot, Peter Fettke, Artem Polyvyanyy, Stefanie Rinderle-Ma, Sebastian Sardi\~na, Niek Tax, Barbara Weber 8d ago

Agentic Business Process Management: A Research Manifesto

Manifesto proposing Agentic Business Process Management (APM) framework extending BPM to govern autonomous agents executing organizational processes with agent-oriented abstractions.

Ax Karen Hambardzumyan, Nicolas Baldwin, Edan Toledo, Rishi Hazra, Michael Kuchnik, Bassel Al Omari, Thomas Simon Foster, Anton Protopopov, Jean-Christophe Gagnon-Audet, Ishita Mediratta, Kelvin Niu, Michael Shvartsman, Alisia Lupidi, Alexis Audran-Reiss, Parth Pathak, Tatiana Shavrina, Despoina Magka, Hela Momand, Derek Dunfield, Nicola Cancedda, Pontus Stenetorp, Carole-Jean Wu, Jakob Nicolaus Foerster, Yoram Bachrach, Martin Josifoski 8d ago

AIRA_2: Overcoming Bottlenecks in AI Research Agents

AIRA_2 addresses three bottlenecks in AI research agents: synchronous GPU execution, generalization gaps, and fixed LLM operator limitations through improved architectural design.

Ax Hanrong Zhang, Shicheng Fan, Henry Peng Zou, Yankai Chen, Zhenting Wang, Jiayu Zhou, Chengze Li, Wei-Chieh Huang, Yifei Yao, Kening Zheng, Xue Liu, Xiaoxiao Li, Philip S. Yu 8d ago

CoEvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

CoEvoSkills framework enables LLM agents to self-evolve structured multi-file skill artifacts through co-evolutionary verification without manual authoring.

Ax Anushree Sinha, Srivaths Ranganathan, Debanshu Das, Abhishek Dharmaratnakar 8d ago

Beyond Fluency: Toward Reliable Trajectories in Agentic IR

Position paper on failure modes in agentic IR systems, analyzing error cascades in multi-step reason-act-observe workflows despite linguistic fluency.

Ax Mohamed Elfeki, Tu Trinh, Kelvin Luu, Guangze Luo, Nathan Hunt, Ernesto Montoya, Nandan Marwaha, Yannis He, Charles Wang, Fernando Crabedo, Alessa Castilo, Bing Liu 8d ago

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

HiL-Bench evaluates whether coding agents know when to request help with incomplete specifications, exposing judgment gaps in frontier models.