Ax Daniel Yang, Samuel Stante, Florian Redhardt, Lena Libon, Parnian Kassraie, Ido Hakimi, Barna P\'asztor, Andreas Krause 3/2/2026

RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models

RewardUQ: Framework for uncertainty quantification in reward models used to align LLMs with human preferences, reducing annotation costs.

Ax Rishabh Kabra, Maks Ovsjanikov, Drew A. Hudson, Ye Xia, Skanda Koppula, Andre Araujo, Joao Carreira, Niloy J. Mitra 3/2/2026

A Mixed Diet Makes DINO An Omnivorous Vision Encoder

Pre-training method for vision encoders (DINO) to improve cross-modal feature alignment between RGB images and depth maps across different modalities.

Ax Haritz Puerto, Haonan Li, Xudong Han, Timothy Baldwin, Iryna Gurevych 3/2/2026

Controllable Reasoning Models Are Private Thinkers

Method for training reasoning models to follow instructions in reasoning traces to prevent unintended leakage of private information in AI agents processing sensitive user data.

Ax Ali Behrouz, Zeman Li, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni 3/2/2026

Memory Caching: RNNs with Growing Memory

Exploration of recurrent architectures with growing memory as subquadratic alternatives to Transformers for sequence modeling.

Ax Weinan Dai, Hanlin Wu, Qiying Yu, Huan-ang Gao, Jiahao Li, Chengquan Jiang, Weiqiang Lou, Yufan Song, Hongli Yu, Jiaze Chen, Wei-Ying Ma, Ya-Qin Zhang, Jingjing Liu, Mingxuan Wang, Xin Liu, Hao Zhou 3/2/2026

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDA Agent system using large-scale agentic RL to generate optimized GPU kernels, bridging gap between LLMs and compiler-based systems.

Ax Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas 3/2/2026

Do LLMs Benefit From Their Own Words?

Study comparing standard multi-turn prompting with user-turn-only prompting to determine if LLMs benefit from their own prior responses.

Ax Wenliang Li, Rui Yan, Xu Zhang, Li Chen, Hongji Zhu, Jing Zhao, Junjun Li, Mengru Li, Wei Cao, Zihang Jiang, Wei Wei, Kun Zhang, Shaohua Kevin Zhou 3/2/2026

MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM

Multi-agent system for clinical diagnosis that accumulates self-learned clinical knowledge across agent interactions for improved LLM performance.

Ax Zixuan Huang, Xin Xia, Yuxi Ren, Jianbin Zheng, Xuefeng Xiao, Hongyan Xie, Li Huaqiu, Songshi Liang, Zhongxiang Dai, Fuzhen Zhuang, Jianxin Li, Yikun Ban, Deqing Wang 3/2/2026

Real-Time Aligned Reward Model beyond Semantics

Real-time alignment technique for RLHF reward models to prevent overoptimization and maintain human intent capture.

Ax Zixuan Huang, Xin Xia, Yuxi Ren, Jianbin Zheng, Xuanda Wang, Zhixia Zhang, Hongyan Xie, Songshi Liang, Zehao Chen, Xuefeng Xiao, Fuzhen Zhuang, Jianxin Li, Yikun Ban, Deqing Wang 3/2/2026

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Study on whether Large Reasoning Models know when to stop thinking, addressing redundancy in long chains-of-thought.

Ax Haibo Tong, Feifei Zhao, Linghao Feng, Ruoyu Wu, Ruolin Chen, Lu Jia, Zhou Zhao, Jindong Li, Tenglong Li, Erliang Lin, Shuai Yang, Enmeng Lu, Yinqian Sun, Qian Zhang, Zizhe Ruan, Jinyu Fan, Zeyang Yue, Ping Wu, Huangrui Li, Chengyi Sun, Yi Zeng 3/2/2026

ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI

ForesightSafety Bench evaluates frontier risks in autonomous AI with unpredictable and difficult-to-control behaviors.

Ax Tony Feng, Junehyuk Jung, Sang-hyun Kim, Carlo Pagano, Sergei Gukov, Chiang-Chiang Tsai, David Woodruff, Adel Javanmard, Aryan Mokhtari, Dawsen Hwang, Yuri Chervonyi, Jonathan N. Lee, Garrett Bingham, Trieu H. Trinh, Vahab Mirrokni, Quoc V. Le, Thang Luong 3/2/2026

Aletheia tackles FirstProof autonomously

Aletheia AI agent solved 6/10 FirstProof mathematics challenges autonomously using Gemini 3 Deep Think reasoning.