Ax Ruitong Li, Aisheng Mo, Guowei Su, Ru Zhang, Binjie Guo, Haohan Jiang, Xurong Lin, Hongyan Wei, Jie Li, Zhiyuan Qian, Zhuhao Zhang, Xiaoyuan Cheng 3/24/2026

AlphaZero-Edu: Democratizing Access to AlphaZero

Educational implementation of AlphaZero reinforcement learning framework addressing complexity and reproducibility challenges for broader accessibility.

Ax Xuandong Zhao, Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song 3/24/2026

Learning to Reason without External Rewards

Intuitor: LLM reasoning method using internal confidence signals for RL without external rewards or labeled data.

Ax Sarah Lockfisch, Kristian Schwethelm, Martin Menten, Rickmer Braren, Daniel Rueckert, Alexander Ziller, Georgios Kaissis 3/24/2026

On Arbitrary Predictions from Equally Valid Models

Empirical analysis of predictive multiplicity in ML models, examining conflicting predictions across equally valid models.

Ax Sitong Chen, Shen Nie, Jiacheng Sun, Zijin Feng, Zhenguo Li, Ji-Rong Wen, Chongxuan Li 3/24/2026

Masked Diffusion Models as Energy Minimization

Theoretical framework interpreting masked diffusion models as solutions to discrete optimal transport energy minimization problems.

HN sannysanoff 3/24/2026

LLM Can Be a Supercompiler

Case study using LLM to optimize legacy Java code performance through refactoring suggestions.