Ax Ritish Shrirao, Aditya Priyadarshi, Raghuram Bharadwaj Diddigi 4/2/2026

Full-Gradient Successor Feature Representations

arXiv: Full-gradient successor feature representations improve convergence guarantees for transfer learning in RL with non-linear function approximation.

Ax Yu Xia, Canwen Xu, Zhewei Yao, Julian McAuley, Yuxiong He 4/2/2026

Learning to Hint for Reinforcement Learning

arXiv: Group Relative Policy Optimization for RL addresses advantage collapse in reinforcement learning with verifiable rewards using hints.

Ax Huaiyang Wang, Xiaojie Li, Deqing Wang, Haoyi Zhou, Zixuan Huang, Yaodong Yang, Jianxin Li, Yikun Ban 4/2/2026

Policy Improvement Reinforcement Learning

Reinforcement learning approach with verification for iteratively improving LLM policies based on actual performance gains.

Ax Ken M. Nakanishi 4/2/2026

Screening Is Enough

Multiscreen mechanism for language models enabling absolute query-key relevance assessment beyond relative attention redistribution.