Ax Zhixing You, Jiachen Yuan, Jason Cai 3/20/2026

D-Mem: A Dual-Process Memory System for LLM Agents

Introduces D-Mem, a dual-process memory system for LLM agents enabling high-fidelity memory access for long-horizon reasoning and autonomous operation.

Ax Huichi Zhou, Siyuan Guo, Anjie Liu, Zhongwei Yu, Ziqin Gong, Bowen Zhao, Zhixun Chen, Menglong Zhang, Yihang Chen, Jinsong Li, Runyu Yang, Qiangbin Liu, Xinlei Yu, Jianmin Zhou, Na Wang, Chunyang Sun, Jun Wang 3/20/2026

Memento-Skills: Let Agents Design Agents

LLM agent system that autonomously designs task-specific agents through memory-based RL and stateful prompts. Meta-agent framework with skill-based continual learning.

Ax Wenxuan Zhang, Lemeng Wu, Changsheng Zhao, Ernie Chang, Mingchen Zhuge, Zechun Liu, Andy Su, Hanxian Huang, Jun Chen, Chong Zhou, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Wei Wen 3/20/2026

dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models

Policy optimization technique for diffusion LLMs reducing trajectory computation cost. Improves efficiency of preference alignment in generative language models.

Ax Hao Zhang, Mingjie Liu, Shaokun Zhang, Songyang Han, Jian Hu, Zhenghui Jin, Yuchi Zhang, Shizhe Diao, Ximing Lu, Binfeng Xu, Zhiding Yu, Jan Kautz, Yi Dong 3/20/2026

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

Service architecture for distributed RL training of multi-turn LLM agents. Decouples rollout orchestration from training for scalable agent development.

Ax Pranjal Aggarwal, Marjan Ghazvininejad, Seungone Kim, Ilia Kulikov, Jack Lanchantin, Xian Li, Tianjian Li, Bo Liu, Graham Neubig, Anaelia Ovalle, Swarnadeep Saha, Sainbayar Sukhbaatar, Sean Welleck, Jason Weston, Chenxi Whitehouse, Adina Williams, Jing Xu, Ping Yu, Weizhe Yuan, Jingyu Zhang, Wenting Zhao 3/20/2026

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Research on LLM mathematical reasoning with formal expression derivation. Addresses structured reasoning in STEM via language models.

Ax Matt Gorbett, Suman Jana 3/20/2026

Secure Linear Alignment of Large Language Models

Studies cross-model alignment of LLM representations for downstream objectives with applications in privacy-preserving and security-constrained settings.

Ax Diego Calvanese, Angelo Casciani, Giuseppe De Giacomo, Marlon Dumas, Fabiana Fournier, Timotheus Kampik, Emanuele La Malfa, Lior Limonad, Andrea Marrella, Andreas Metzger, Marco Montali, Daniel Amyot, Peter Fettke, Artem Polyvyanyy, Stefanie Rinderle-Ma, Sebastian Sardi\~na, Niek Tax, Barbara Weber 3/20/2026

Agentic Business Process Management: A Research Manifesto

Research manifesto proposing Agentic Business Process Management paradigm extending BPM for governing autonomous agents executing organizational processes.

Ax Chun-Jui Wang, Jian-Ting Guo, Hung Guei, Chung-Chin Shih, Ti-Rong Wu, I-Chen Wu 3/20/2026

Evaluating Game Difficulty in Tetris Block Puzzle

Uses Stochastic Gumbel AlphaZero to evaluate difficulty in Tetris Block Puzzle variants, applying game-playing AI as evaluator for puzzle design.

Ax Maksym Del, Markus K\"angsepp, Marharyta Domnich, Ardi Tampuu, Lisa Yankovskaya, Meelis Kull, Mark Fishel 3/20/2026

How Uncertainty Estimation Scales with Sampling in Reasoning Models

Studies how uncertainty estimation scales with parallel sampling in reasoning models using self-consistency and verbalized confidence across mathematics and STEM tasks.

Ax Qiang Li, XiangRui Zhang, Haining Wang 3/20/2026

Implicit Patterns in LLM-Based Binary Analysis

Large-scale trace-level study showing multi-pass LLM reasoning in binary vulnerability analysis exhibits structured, token-level exploration patterns across hundreds of steps.

Ax Zehao Li, Zhenyu Wu, Yibo Zhao, Bowen Yang, Jingjing Xie, Zhaoyang Liu, Zhoumianze Liu, Kaiming Jin, Jianze Liang, Zonglin Li, Feng Wu, Bowen Zhou, Zun Wang, Zichen Ding 3/20/2026

OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

OS-Themis: scalable multi-agent critic framework using decomposed trajectory milestones for training robust GUI agents with reinforcement learning.