Ax Sirui Li, Shuhan Xiao, Mihir Joshi, Ahmed Metwally, Daniel McDuff, Wei Wang, Yuzhe Yang 3/17/2026

HEARTS: Benchmarking LLM Reasoning on Health Time Series

HEARTS benchmark for evaluating LLM reasoning on health time series across multiple physiological modalities and temporal dependencies.

Ax Peter Brodeur, Jacob M. Koshy, Anil Palepu, Khaled Saab, Ava Homiar, Roma Ruparel, Charles Wu, Ryutaro Tanno, Joseph Xu, Amy Wang, David Stutz, Wei-Hung Weng, Hannah M. Ferrera, David Barrett, Lindsey Crowley, Jihyeon Lee, Spencer E. Rittner, Ellery Wulczyn, Selena K. Zhang, Elahe Vedadi, Christine G. Kohn, Kavita Kulkarni, Vinay Kadiyala, Sara Mahdavi, Wendy Du, Jessica M. Williams, David Feinbloom, Renee Wong, Tao Tu, Petar Sirkovic, Alessio Orlandi, Christopher Semturs, Yun Liu, Juraj Gottweis, Dale R. Webster, Jo\"elle Barral, Katherine Chou, Pushmeet Kohli, Avinatan Hassidim, Yossi Matias, James Manyika, Rob Fields, Jonathan X. Li, Marc L. Cohen, Vivek Natarajan, Mike Schaekermann, Alan Karthikesalingam, Adam Rodman 3/17/2026

A prospective clinical feasibility study of a conversational diagnostic AI in an ambulatory primary care clinic

Clinical feasibility study of LLM-based conversational diagnostic AI (AMIE) in real primary care workflows with safety evaluation.

Ax Nicolas Schischka, Nikhil Gosala, B Ravi Kiran, Senthil Yogamani, Abhinav Valada 3/17/2026

Open-World Motion Forecasting

Motion forecasting for autonomous vehicles handling open-world scenarios with imperfect perception and evolving object taxonomies.

Ax Ruiying Li, Yunlang Zhou, YuYao Zhu, Kylin Chen, Jingyuan Wang, Sukai Wang, Kongtao Hu, Minhui Yu, Bowen Jiang, Zhan Su, Jiayao Ma, Xin He, Yongjian Shen, Yang Yang, Guanghui Ren, Maoqing Yao, Wenhao Wang, Yao Mu 3/17/2026

RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks

RoboClaw: Agentic framework unifying data collection, policy learning, and deployment for long-horizon robotic manipulation using Vision-Language-Action systems.

Ax Xingze Zou, Jing Wang, Yuhua Zheng, Xueyi Chen, Haolei Bai, Lingcheng Kong, Syed A. R. Abu-Bakar, Zhaode Wang, Chengfei Lv, Haoji Hu, Huan Wang 3/17/2026

MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?

MobileKernelBench benchmark evaluating LLM capability to generate efficient computational kernels for mobile devices, with systematic investigation of code generation limits.

Ax Duncan Eddy, Esen Yel, Emma Passmore, Niles Egan, Grayson Armour, Dylan M. Asmar, Mykel J. Kochenderfer 3/17/2026

Optimizing Task Completion Time Updates Using POMDPs

POMDP-based approach for optimizing when to update task completion time announcements in project management, balancing accuracy and stakeholder trust.

Ax Zheda Mai, Ke Zhang, Fu-En Wang, Zixiao Ken Wang, Albert Y. C. Chen, Lu Xia, Min Sun, Wei-Lun Chao, Cheng-Hao Kuo 3/17/2026

Revisiting Model Stitching In the Foundation Model Era

Research on model stitching technique for Vision Foundation Models, testing representational compatibility across models with different training objectives and data sources.

Ax Dayuan Fu, Shenyu Wu, Yunze Wu, Zerui Peng, Yaxing Huang, Jie Sun, Ji Zeng, Mohan Jiang, Lin Zhang, Yukun Li, Jiarui Hu, Liming Liu, Jinlong Hou, Pengfei Liu 3/17/2026

daVinci-Env: Open SWE Environment Synthesis at Scale

Large-scale open-source software engineering environment for training AI agents with executable, verifiable tasks and dynamic feedback.

Ax Daniel Bretsko, Piotr Walas, Devashish Khulbe, Sebastian Stros, Stanislav Sobolevsky, Tomas Satura 3/17/2026

FastODT: A tree-based framework for efficient continual learning

Tree-based continual learning framework for non-stationary data distributions with constrained computational resources in time series applications.

Ax Thibault Formal, Maxime Louis, Herv\'e Dejean, St\'ephane Clinchant 3/17/2026

Learning Retrieval Models with Sparse Autoencoders

Sparse autoencoders foundation for learned sparse retrieval, decomposing LLM representations into interpretable latent features for efficient document retrieval.

Ax Sunghyeon Woo, Jaeeun Kil, Hoseung Kim, Minsub Kim, Joonghoon Kim, Ahreum Seo, Sungjae Lee, Minjung Jo, Jiwon Ryu, Baeseong Park, Se Jung Kwon, Dongsoo Lee 3/17/2026

ICaRus: Identical Cache Reuse for Efficient Multi Model Inference

Multi-model inference optimization reusing identical KV caches across models to reduce memory consumption in agentic AI systems.

Ax Angelika Romanou, Mark Ibrahim, Candace Ross, Chantal Shaib, Kerem Okta, Sam Bell, Elia Ovalle, Jesse Dodge, Antoine Bosselut, Koustuv Sinha, Adina Williams 3/17/2026

Brittlebench: Quantifying LLM robustness via prompt sensitivity

Benchmark quantifying LLM robustness by measuring model sensitivity to prompt variations, typos, and paraphrases in real-world conditions.