Ax Xinyu Lu, Kaiqi Zhang, Jinglin Yang, Boxi Cao, Yaojie Lu, Hongyu Lin, Min He, Xianpei Han, Le Sun 3/24/2026

P^2O: Joint Policy and Prompt Optimization

Joint optimization of RL policies and LLM prompts for improving reasoning with verifiable rewards on hard samples.

Ax Yurong Chen, Zhiyi Huang, Michael I. Jordan, Haipeng Luo 3/24/2026

Calibeating Made Simple

Theoretical framework reducing calibration of forecasts to online learning techniques with results for general proper losses.