Ax Haocheng Ju, Guoxiong Gao, Jiedong Jiang, Bin Wu, Zeming Sun, Leheng Chen, Yutong Wang, Yuefeng Wang, Zichen Wang, Wanyi He, Peihao Wu, Liang Xiao, Ruochuan Liu, Bryan Dai, Bin Dong 27d ago

Automated Conjecture Resolution with Formal Verification

Framework for automated mathematical conjecture resolution combining LLMs with formal verification to improve reliability of research-level mathematical problem solving.

Ax Xiwen Chen, Jingjing Wang, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hejian Sang, Zhipeng Wang, Alborz Geramifard, Feng Luo 27d ago

SODA: Semi On-Policy Black-Box Distillation for Large Language Models

SODA: Semi on-policy knowledge distillation method for LLMs balancing off-policy simplicity with on-policy effectiveness without adversarial training instability.

Ax Shenzhi Yang, Guangcheng Zhu, Bowen Song, Sharon Li, Haobo Wang, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen 27d ago

Can LLMs Learn to Reason Robustly under Noisy Supervision?

Analysis of LLM reasoning models under noisy labels in reinforcement learning with verifiable rewards, identifying label noise vulnerabilities.

Ax Haonian Ji, Kaiwen Xiong, Siwei Han, Peng Xia, Shi Qiu, Yiyang Zhou, Jiaqi Liu, Jinlong Li, Bingzhou Li, Zeyu Zheng, Cihang Xie, Huaxiu Yao 27d ago

ClawArena: Benchmarking AI Agents in Evolving Information Environments

ClawArena benchmark for evaluating AI agents in dynamic environments with evolving information, contradictions, and implicit user feedback.