Ax Tung-Long Vuong, Julien Monteil, Hien Dang, Volodymyr Vaskovych, Trung Le, Vu Nguyen 2/17/2026

On the Mechanisms of Collaborative Learning in VAE Recommenders

Theoretical analysis of collaborative learning in VAE-based recommender systems, showing latent proximity governs how binary masking improves performance.

Ax Chanakya Ekbote, Vijay Lingam, Sujay Sanghavi, Jun Huan, Behrooz Omidvar-Tehrani, Anoop Deoras, Stefano Soatto 2/17/2026

MURPHY: Multi-Turn GRPO for Self Correcting Code Generation

MURPHY: Multi-turn reinforcement learning framework for self-correcting code generation combining group relative policy optimization with execution verification.

Ax Chi-Yu Chen, Rawan Abulibdeh, Arash Asgari, Sebasti\'an Andr\'es Cajas Ord\'o\~nez, Leo Anthony Celi, Deirdre Goode, Hassan Hamidi, Laleh Seyyed-Kalantari, Ned McCague, Thomas Sounack, Po-Chih Kuo 2/17/2026

Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Types

Study showing state-of-the-art vision models trained on normal chest X-rays can predict patient health insurance type, revealing encoded socioeconomic bias.

Ax Bidipta Sarkar, Mattie Fellows, Juan Agustin Duque, Alistair Letcher, Antonio Le\'on Villares, Anya Sims, Clarisse Wibault, Dmitry Samsonov, Dylan Cope, Jarek Liesen, Kang Li, Lukas Seier, Theo Wolf, Uljad Berdica, Valentin Mohl, Alexander David Goldie, Aaron Courville, Karin Sevegnani, Shimon Whiteson, Jakob Nicolaus Foerster 2/17/2026

Evolution Strategies at the Hyperscale

EGGROLL: Scalable evolution strategies algorithm using low-rank approximations to improve training efficiency of black-box optimization on GPUs.

Ax Yuepeng Sheng, Yuwei Huang, Shuman Liu, Anxiang Zeng, Haibo Zhang 2/17/2026

ESPO: Entropy Importance Sampling Policy Optimization

ESPO: Entropy importance sampling policy optimization for stable and efficient token-level RL training of LLMs on complex reasoning tasks at scale.

Ax Cong Wang, Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Yingming Gao, Ya Li 2/17/2026

RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS

RRPO: Robust reward policy optimization framework preventing reward hacking in LLM-based emotional text-to-speech by addressing vulnerability of vanilla reward models.

Ax Jonas H\"ubotter, Frederike L\"ubeck, Lejs Behric, Anton Baumann, Marco Bagatella, Daniel Marta, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, Andreas Krause 2/17/2026

Reinforcement Learning via Self-Distillation

Self-distillation approach for reinforcement learning leveraging rich textual feedback from verifiable environments to improve credit assignment in code/math tasks.

Ax Tao Yu, Haopeng Jin, Hao Wang, Shenghua Chai, Yujia Yang, Junhao Gong, Jiaming Guo, Minghui Zhang, Xinlong Chen, Zhenghao Zhang, Yuxuan Zhou, Yufei Xiong, Shanbin Zhang, Jiabing Yang, Hongzhu Yi, Xinming Wang, Cheng Zhong, Xiao Ma, Zhang Zhang, Yan Huang, Liang Wang 2/17/2026

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

Benchmark for open-domain video shot retrieval using LLMs for understanding editing requirements and retrieving keyframe-oriented shots.

Ax David P. Woodruff, Vincent Cohen-Addad, Lalit Jain, Jieming Mao, Song Zuo, MohammadHossein Bateni, Simina Branzei, Michael P. Brenner, Lin Chen, Ying Feng, Lance Fortnow, Gang Fu, Ziyi Guan, Zahra Hadizadeh, Mohammad T. Hajiaghayi, Mahdi JafariRaviz, Adel Javanmard, Karthik C. S., Ken-ichi Kawarabayashi, Ravi Kumar, Silvio Lattanzi, Euiwoong Lee, Yi Li, Ioannis Panageas, Dimitris Paparas, Benjamin Przybocki, Bernardo Subercaseaux, Ola Svensson, Shayan Taherijam, Xuan Wu, Eylon Yogev, Morteza Zadimoghaddam, Samson Zhou, Yossi Matias, James Manyika, Vahab Mirrokni 2/17/2026

Accelerating Scientific Research with Gemini: Case Studies and Common Techniques

Case studies of Google's Gemini models assisting scientific research including mathematical discovery and routine task automation.

Ax Emiliano Penaloza, Dheeraj Vattikonda, Nicolas Gontier, Alexandre Lacoste, Laurent Charlin, Massimo Caccia 2/17/2026

Privileged Information Distillation for Language Models

Studying knowledge distillation from privileged information in language models for multi-turn agentic environments, addressing inference-time capability transfer.

Ax Merlin de la Haye, Pascal Lenzner, Farehe Soheil, Marcus Wunderlich 2/17/2026

Metric Hedonic Games on the Line

Game-theoretic analysis of coalition formation in hedonic games using metric spaces.

Ax Babak Rahmani 2/17/2026

Debugging code world models

Analyzes errors and limitations in Code World Models that simulate program execution by predicting runtime state.

Ax Bojian Hou, Xiaolong Liu, Xiaoyi Liu, Jiaqi Xu, Yasmine Badr, Mengyue Hang, Sudhanshu Chanpuriya, Junqing Zhou, Yuhang Yang, Han Xu, Qiuling Suo, Laming Chen, Yuxi Hu, Jiasheng Zhang, Huaqing Xiong, Yuzhen Huang, Chao Chen, Yue Dong, Yi Yang, Shuo Chang, Xiaorui Gan, Wenlin Chen, Santanu Kolay, Darren Liu, Jade Nie, Chunzhi Yang, Ellie Wen, Jiyan Yang, Huayu Li 2/17/2026

Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

Derives scaling laws for massive-scale recommendation systems through unified architecture design and efficiency improvements.