Ax Minqi Jiang, Andrei Lupu, Yoram Bachrach 2/24/2026

Bootstrapping Task Spaces for Self-Improvement

Presents Exploratory Iteration (ExIt), RL methods enabling agents to self-improve through iterative refinement without fixed iteration limits.

Ax Geon Lee, Bhuvesh Kumar, Clark Mingxuan Ju, Tong Zhao, Kijung Shin, Neil Shah, Liam Collins 2/24/2026

Sequential Data Augmentation for Generative Recommendation

Data augmentation strategies for generative recommendation systems improving generalization in sequential user behavior prediction.

Ax Aman Gupta, Rafael Celente, Abhishek Shivanna, D. T. Braithwaite, Gregory Dexter, Shao Tang, Hiroto Udagawa, Daniel Silva, Rohan Ramanath, S. Sathiya Keerthi 2/24/2026

Effective Quantization of Muon Optimizer States

8-bit blockwise quantization of Muon optimizer states reducing memory overhead for large-scale LLM pretraining.

Ax Jubayer Ibn Hamid, Ifdita Hasan Orney, Ellen Xu, Chelsea Finn, Dorsa Sadigh 2/24/2026

Polychromic Objectives for Reinforcement Learning

Polychromic objectives for RL fine-tuning preventing policy collapse and preserving diversity in pretrained model behaviors.

Ax Jaewoo Lee, Minsu Kim, Sanghyeok Choi, Inhyuck Song, Sujin Yun, Hyeongyu Kang, Woocheol Shin, Taeyoung Yun, Kiyoung Om, Jinkyoo Park 2/24/2026

Diffusion Alignment as Variational Expectation-Maximization

Diffusion Alignment as Variational EM framework addressing reward over-optimization and mode collapse in diffusion model alignment.

Ax Nirjhar Das, Mohit Sharma, Praharsh Nanavati, Kirankumar Shiragur, Amit Deshpande 2/24/2026

Cost Efficient Fairness Audit Under Partial Feedback

Fairness auditing framework for classifiers with partial feedback using cost-aware data acquisition strategies.

Ax Arthur Chen, Zuxin Liu, Jianguo Zhang, Akshara Prabhakar, Zhiwei Liu, Shelby Heinecke, Silvio Savarese, Victor Zhong, Caiming Xiong 2/24/2026

Test-Time Adaptation for LLM Agents via Environment Interaction

Method for adapting LLM agents to novel environments through test-time interaction, addressing syntactic and semantic mismatches in observation formats and state dynamics.

Ax Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng, Xiaobing Yu, Yu Zhong, Shangqi Deng, Ufaq Khan, Jianghao Wu, Xiaofeng Liu, Imran Razzak, Xiaojun Chang, Yutong Xie 2/24/2026

SelfAI: A self-directed framework for long-horizon scientific discovery

SelfAI multi-agent system for self-directed long-horizon scientific discovery with human-in-the-loop workflows and exploration trade-offs.

Ax Yaswanth Chittepu, Raghavendra Addanki, Tung Mai, Anup Rao, Branislav Kveton 2/24/2026

ML-Tool-Bench: Tool-Augmented Planning for ML Tasks

ML-Tool-Bench framework for tool-augmented planning in autonomous ML agents orchestrating data analysis and model optimization workflows.

Ax Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao 2/24/2026

Group Representational Position Encoding

GRAPE framework unifying positional encoding mechanisms using group actions for multiplicative rotations and additive biases.

Ax Kecheng Cai, Chao Peng, Chenyang Xu, Xia Chen, Yi Wang, Shuo Shi, Qiyuan Liang 2/24/2026

Self-Augmented Mixture-of-Experts for QoS Prediction

Mixture-of-experts model with self-augmentation for Quality of Service prediction in web service recommendation systems.