Ax Anrui Chen, Ruijun Huang, Xin Zhang, Fang Dong, Hengjie Cao, Zhendong Huang, Yifeng Yang, Mengyi Chen, Jixian Zhou, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Qin Lv, Robert P. Dick, Yuan Cheng, Tun Lu, Fan Yang, Li Shang 2/16/2026

Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers

Analysis of catastrophic forgetting in mixture-of-experts transformers with multi-head attention.

Ax Jintao Zhang, Haoxu Wang, Kai Jiang, Kaiwen Zheng, Youhe Jiang, Ion Stoica, Jianfei Chen, Jun Zhu, Joseph E. Gonzalez 2/16/2026

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

SLA2 improves sparse-linear attention for diffusion models with learnable routing and quantization-aware training.

Ax Shreyas Fadnavis 2/16/2026

Leverage-Weighted Conformal Prediction

Leverage-weighted conformal prediction method for adaptive prediction intervals using statistical leverage without auxiliary models.

Ax George Alexandru Adam, Alexander Cui, Edwin Thomas, Emily Napier, Nazar Shmatko, Jacob Schnell, Jacob Junqi Tian, Alekhya Dronavalli, Edward Tian, Dongwon Lee 2/16/2026

GPTZero: Robust Detection of LLM-Generated Texts

Detection system for distinguishing AI-generated from human-authored text to prevent misinformation and content fraud.

Ax Gengsheng Li, Jinghan He, Shijie Wang, Dan Zhang, Ruiqi Liu, Renrui Zhang, Zijun Yao, Junfeng Fang, Haiyun Guo, Jinqiao Wang 2/16/2026

R-Diverse: Mitigating Diversity Illusion in Self-Play LLM Training

Solution to diversity collapse in self-play LLM training where challenger-solver loops degrade over iterations despite initial gains.

Ax Solveig Wittig, Antonis Vasileiou, Robert R. Nerem, Timo Stoll, Floris Geerts, Yusu Wang, Christopher Morris 2/16/2026

Which Algorithms Can Graph Neural Networks Learn?

Study on which discrete algorithms graph neural networks can learn, advancing understanding of neural algorithmic reasoning capabilities.