Ax Shuofei Qiao, Yanqiu Zhao, Zhisong Qiu, Xiaobin Wang, Jintian Zhang, Zhao Bin, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen 3/16/2026

Scaling Generalist Data-Analytic Agents

DataMind: scalable data-analytic AI agents for automated discovery. Open-source agent framework handling diverse-format data files and multi-step reasoning.

Ax Hritik Bansal, Devendra Singh Sachan, Kai-Wei Chang, Aditya Grover, Gargi Ghosh, Wen-tau Yih, Ramakanth Pasunuru 3/16/2026

HoneyBee: Data Recipes for Vision-Language Reasoners

HoneyBee: data curation approaches for vision-language reasoning datasets. Analyzes impact of context, content, and format on VLM reasoning capabilities.

Ax Yash Jangir, Yidi Zhang, Pang-Chi Lo, Kashu Yamazaki, Chenyu Zhang, Kuan-Hsun Tu, Tsung-Wei Ke, Lei Ke, Yonatan Bisk, Katerina Fragkiadaki 3/16/2026

RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation

RobotArena ∞: scalable robot benchmarking via real-to-sim translation. Enables rigorous evaluation of robot policies across diverse tasks and environments.

Ax Xinwu Ye, Yicheng Mao, Jia Zhang, Yimeng Liu, Li Hao, Fang Wu, Zhiwei Li, Yuxuan Liao, Zehong Wang, Yingcheng Wu, Zhiyuan Liu, Zhenfei Yin, Li Yuan, Philip Torr, Huan Sun, Xiangxiang Zeng, Mengdi Wang, Le Cong, Shenghua Gao, Xiangru Tang 3/16/2026

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

LatentChem: latent reasoning interface for chemical LLMs. Decouples chemical computation from discrete tokens to improve efficiency and performance in chemical reasoning.

Ax Markus Knauer, Samuel Bustamante, Thomas Eiband, Alin Albu-Sch\"affer, Freek Stulp, Jo\~ao Silv\'erio 3/16/2026

IROSA: Interactive Robot Skill Adaptation using Natural Language

IROSA: framework combining foundation models with imitation learning for robot skill adaptation via natural language. LLM application to robotics.

Ax Linus Folkerts, Will Payne, Simon Inman, Philippos Giavridis, Joe Skinner, Sam Deverett, James Aung, Ekin Zorer, Michael Schmatz, Mahmoud Ghanem, John Wilkinson, Alan Steer, Vy Hong, Jessica Wang 3/16/2026

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

Benchmark evaluating frontier AI models on multi-step cyber attack scenarios. Agent capability measurement across extended action sequences.

HN tjohnell 3/15/2026

LLMs can be absolutely exhausting

Discussion of mental fatigue and workflow challenges when working with LLMs like Claude and Codex, and recovery strategies.

HN mooreds 3/15/2026

Securing AI Agents

Overview of layered security architecture for AI agents, emphasizing secure human identity verification and token-based authorization.

HN g_br_l 3/15/2026

Do you really need an agent?

Critical perspective on AI agent hype, questioning whether agents are necessary or overused in current implementations.