Ax Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf, Haifeng Ruan, Ridwan Shariffdeen, Abhik Roychoudhury 3/25/2026

Code Review Agent Benchmark

Introduces benchmark dataset and evaluation framework for code review agents, addressing code quality assurance as AI-generated code scales.

Ax Haoran Yuan, Weigang Yi, Zhenyu Zhang, Wendi Chen, Yuchen Mo, Jiashi Yin, Xinzhuo Li, Xiangyu Zeng, Chuan Wen, Cewu Lu, Katherine Driggs-Campbell, Ismini Lourentzou 3/25/2026

VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

Proposes VTAM, extending video-action models for embodied AI with tactile sensing for contact-rich physical interactions beyond vision-only approaches.

Ax Ufaq Khan, Umair Nawaz, L D M S S Teja, Numaan Saeed, Muhammad Bilal, Yutong Xie, Mohammad Yaqub, Muhammad Haris Khan 3/25/2026

MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage

Evaluates Vision Language Models' ability to perform pre-diagnostic sanity checks in medical imaging, identifying gaps between fluent text generation and safe visual understanding.

Ax Nan Huo, Xiaohan Xu, Jinyang Li, Per Jacobsson, Shipei Lin, Bowen Qin, Binyuan Hui, Xiaolong Li, Ge Qu, Shuzheng Si, Linheng Han, Edward Alexander, Xintong Zhu, Rui Qin, Ruihan Yu, Yiyao Jin, Feige Zhou, Weihao Zhong, Yun Chen, Hongyu Liu, Chenhao Ma, Fatma Ozcan, Yannis Papakonstantinou, Reynold Cheng 3/25/2026

BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions

BIRD-INTERACT benchmark evaluating LLMs on multi-turn text-to-SQL tasks with dynamic interactions and error handling.

Ax Raj Ghugare, Roger Creus Castanyer, Catherine Ji, Kathryn Wantlin, Jin Schofield, Karthik Narasimhan, Benjamin Eysenbach 3/25/2026

BuilderBench: The Building Blocks of Intelligent Agents

BuilderBench benchmark for evaluating AI agents' ability to learn through exploration and interaction beyond training data patterns.

Ax Sheng Liu, Long Chen, Zeyun Zhao, Qinglin Gou, Qingyue Wei, Arjun Masurkar, Kevin M. Spiegler, Philip Kuball, Stefania C. Bray, Megan Bernath, Deanna R. Willis, Jiang Bian, Lei Xing, Eric Topol, Kyunghyun Cho, Yu Huang, Ruogu Fang, Narges Razavian, James Zou 3/25/2026

Cerebra: A Multidisciplinary AI Board for Multimodal Dementia Characterization and Risk Assessment

Cerebra: multi-agent AI system with specialized agents for EHR, clinical notes, and multimodal data in dementia assessment.

Ax Yunni Qu (Department of Computer Science, University of North Carolina at Chapel Hill), Bhargav Vaduri (Department of Computer Science, University of North Carolina at Chapel Hill), Karthikeya Jatoth (Department of Computer Science, University of North Carolina at Chapel Hill), James Wellnitz (Eshelman School of Pharmacy, University of North Carolina at Chapel Hill), Dzung Dinh (Department of Computer Science, University of North Carolina at Chapel Hill), Seth Veenbaas (Eshelman School of Pharmacy, University of North Carolina at Chapel Hill), Jonathan Chapman (Eshelman School of Pharmacy, University of North Carolina at Chapel Hill), Alexander Tropsha (Eshelman School of Pharmacy, University of North Carolina at Chapel Hill), Junier Oliva (Department of Computer Science, University of North Carolina at Chapel Hill) 3/25/2026

Reliable OOD Virtual Screening with Extrapolatory Pseudo-Label Matching

Method for reliable out-of-distribution virtual screening in drug discovery using extrapolatory pseudo-label matching.

Ax Ekaterina Kochetkova, Kshiteej Sheth, Insu Han, Amir Zandieh, Michael Kapralov 3/25/2026

Streaming Attention Approximation via Discrepancy Theory

BalanceKV: streaming algorithm using discrepancy theory to approximate attention for efficient long-context LLM token generation.

Ax Enrico Parisini, Tapabrata Chakraborti, Chris Harbron, Ben D. MacArthur, Christopher R. S. Banerji 3/25/2026

Leakage and Interpretability in Concept-Based Models

Information-theoretic framework to characterize and quantify information leakage in concept-based models for interpretability.

Ax Lorenzo Steccanella, Joshua B. Evans, \"Ozg\"ur \c{S}im\c{s}ek, Anders Jonsson 3/25/2026

Learning The Minimum Action Distance

Learns minimum action distance metric from state trajectories alone to capture environment structure for MDPs without rewards or action labels.

Ax Lu Han, Yu Liu, Lan Li, Qiwen Deng, Jian Jiang, Yinbo Sun, Zhe Yu, Binfeng Wang, Xingyu Lu, Lintao Ma, Han-Jia Ye, De-Chuan Zhan 3/25/2026

UniCA: Unified Covariate Adaptation for Time Series Foundation Model

UniCA unifies covariate adaptation for time series foundation models to handle diverse heterogeneous covariates including categorical and multimodal data.