Ax Hyejun Jeong, Amir Houmansadr, Shlomo Zilberstein, Eugene Bagdasarian 2/17/2026

Persuasion Propagation in LLM Agents

Study on persuasion propagation: how belief-level intervention affects downstream behavior in LLM agents executing long-horizon tasks.

Ax Alisia Lupidi, Bhavul Gauri, Thomas Simon Foster, Bassel Al Omari, Despoina Magka, Alberto Pepe, Alexis Audran-Reiss, Muna Aghamelu, Nicolas Baldwin, Lucia Cipolina-Kun, Jean-Christophe Gagnon-Audet, Chee Hau Leow, Sandra Lefdal, Hossam Mossalam, Abhinav Moudgil, Saba Nazir, Emanuel Tewolde, Isabel Urrego, Jordi Armengol Estape, Amar Budhiraja, Gaurav Chaurasia, Abhishek Charnalia, Derek Dunfield, Karen Hambardzumyan, Daniel Izcovich, Martin Josifoski, Ishita Mediratta, Kelvin Niu, Parth Pathak, Michael Shvartsman, Edan Toledo, Anton Protopopov, Roberta Raileanu, Alexander Miller, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach 2/17/2026

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

AIRS-Bench: benchmark of 20 ML research tasks for evaluating AI agent capabilities across language modeling, mathematics, bioinformatics, and time series forecasting.

Ax John Muchovej, Amanda Royka, Shane Lee, Julian Jara-Ettinger 2/17/2026

GPT-4o Lacks Core Features of Theory of Mind

Tests whether GPT-4o possesses Theory of Mind via causal model evaluation, finding it lacks core ToM representations.

Ax Tianyu Chen, Shuai Lu, Shan Lu, Yeyun Gong, Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Hao Yu, Nan Duan, Peng Cheng, Fan Yang, Shuvendu K Lahiri, Tao Xie, Lidong Zhou 2/17/2026

Automated Proof Generation for Rust Code via Self-Evolution

SAFE framework automates formal proof generation for Rust code using LLMs via self-evolution to overcome proof data scarcity.

Ax Jiacheng Cui, Zhaoyi Li, Xiaochen Ma, Xinyue Bi, Yaxin Luo, Zhiqiang Shen 2/17/2026

Dataset Distillation via Committee Voting

Proposes CV-DD, a committee voting approach for dataset distillation to create compact representative datasets for efficient model training.

Ax Federico Errica, Henrik Christiansen, Viktor Zaverkin, Mathias Niepert, Francesco Alesiani 2/17/2026

Adaptive Width Neural Networks

Technique for learning neural network layer width during training without manual hyperparameter tuning or architecture search.

Ax Xianrui Zhong, Bowen Jin, Siru Ouyang, Yanzhen Shen, Qiao Jin, Yin Fang, Zhiyong Lu, Jiawei Han 2/17/2026

Benchmarking Retrieval-Augmented Generation for Chemistry

Benchmark for retrieval-augmented generation in chemistry domain with curated evaluation datasets and domain-specific corpora.