Ax Seth Karten, Jake Grigsby, Tersoo Upaa Jr, Junik Bae, Seonghun Hong, Hyunyoung Jeong, Jaeyoon Jung, Kun Kerdthaisong, Gyungbo Kim, Hyeokgi Kim, Yujin Kim, Eunju Kwon, Dongyu Liu, Patrick Mariglia, Sangyeon Park, Benedikt Schink, Xianwei Shi, Anthony Sistilli, Joseph Twin, Arian Urdu, Matin Urdu, Qiao Wang, Ling Wu, Wenli Zhang, Kunsheng Zhou, Stephanie Milani, Kiran Vodrahalli, Amy Zhang, Fei Fang, Yuke Zhu, Chi Jin 3/18/2026

The PokeAgent Challenge: Competitive and Long-Context Learning at Scale

arXiv: PokeAgent benchmark for multi-agent AI decision-making with partial observability, game theory, and long-horizon planning in Pokemon RPG.

Ax Lin Lawrence Guo, Santiago Eduardo Arciniegas, Joseph Jihyung Lee, Adam Paul Yan, George Tomlinson, Jason Fries, Lillian Sung 3/18/2026

Tokenization Tradeoffs in Structured EHR Foundation Models

arXiv: Analyzes tokenization design choices for foundation models trained on structured electronic health records.

Ax Ningkang Peng, Qianfeng Yu, Xiaoqian Peng, Linjing Qian, Yafei Liu, Canran Xiao, Xinyu Lu, Tingyu Lu, Zhichao Zheng, Yanhui Gu 3/18/2026

How to Achieve Prototypical Birth and Death for OOD Detection?

Prototype-based OOD detection method with dynamic prototype count adaptation based on category complexity.

Ax Andres Potapczynski, Ravi Kiran Selvam, Tatiana Konstantinova, Shankar Ramasubramanian, Malcolm Wolff, Kin G. Olivares, Ruijun Ma, Mengfei Cao, Michael W. Mahoney, Andrew Gordon Wilson, Boris N. Oreshkin, Dmitry Efimov 3/18/2026

Time-Aware Prior Fitted Networks for Zero-Shot Forecasting with Exogenous Variables

Zero-shot forecasting method for time series with exogenous variables using prior-fitted networks.

Ax Hanxian Huang, Igor Fedorov, Andrey Gromov, Bernard Beckerman, Naveen Suda, David Eriksson, Maximilian Balandat, Rylan Conway, Patrick Huber, Chinnadhurai Sankar, Ayushi Dalmia, Zechun Liu, Lemeng Wu, Tarek Elgamal, Adithya Sagar, Vikas Chandra, Raghuraman Krishnamoorthi 3/18/2026

MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale

Hardware-in-the-loop architecture search methodology for designing efficient on-device LLMs with real-time latency constraints for mobile deployment.

Ax Swadesh Jana, Cansu Sancaktar, Tom\'a\v{s} Dani\v{s}, Georg Martius, Antonio Orvieto, Pavel Kolev 3/18/2026

GASP: Guided Asymmetric Self-Play For Coding LLMs

Proposes guided asymmetric self-play method for post-training coding LLMs with better problem selection to improve model capabilities.

Ax Xiaolong Han, Ferrante Neri, Zijian Jiang, Fang Wu, Yanfang Ye, Lu Yin, Zehong Wang 3/18/2026

W2T: LoRA Weights Already Know What They Can Do

Analyzes whether LoRA checkpoint weights encode task performance information readable without running the base model, enabling efficient adapter analysis.

Ax Parikshit Gopalan, Konstantinos Stavropoulos, Kunal Talwar, Pranay Tankala 3/18/2026

The Importance of Being Smoothly Calibrated

Theoretical work on smooth calibration as robust calibration measure and step toward omniprediction guarantees.