Ax Egor Denisov, Svetlana Glazyrina, Maksim Kryzhanovskiy, Roman Ischenko 2/24/2026

Smooth Gate Functions for Soft Advantage Policy Optimization

Proposes Soft Adaptive Policy Optimization (SAPO) replacing hard clipping with smooth sigmoid gate functions to stabilize LLM training and reasoning in GRPO framework.

Ax Daniel Ritter, Owen Oertell, Bradley Guo, Jonathan Chang, Kiant\'e Brantley, Wen Sun 2/24/2026

LLMs Can Learn to Reason Via Off-Policy RL

Method for training LLMs to reason using off-policy reinforcement learning, addressing policy lag in distributed training architectures.

Ax Zelin He, Boran Han, Xiyuan Zhang, Shuai Zhang, Haotian Lin, Qi Zhu, Haoyang Fang, Danielle C. Maddix, Abdul Fatir Ansari, Akash Chandrayan, Abhinav Pradhan, Bernie Wang, Matthew Reimherr 2/24/2026

SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

Benchmark combining general reasoning LLMs with domain-specific time-series knowledge for improved time-series diagnostic reasoning tasks.

Ax Bryan Guanrong Shan, Alysa Ziying Tan, Han Yu 2/24/2026

Federated Learning Playground

Interactive browser-based educational platform for learning Federated Learning concepts with real-time visualization of heterogeneous data effects.

Ax Pascal Jr Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau 2/24/2026

Grokking Finite-Dimensional Algebra

Investigation of grokking phenomenon in neural networks learning multiplication in finite-dimensional algebras beyond group operations.

Ax Jesse Farebrother, Matteo Pirotta, Andrea Tirinzoni, Marc G. Bellemare, Alessandro Lazaric, Ahmed Touati 2/24/2026

Compositional Planning with Jumpy World Models

Compositional planning with world models enabling agents to compose pre-trained policies for solving complex tasks.

Ax Pablo Herrero G\'omez, Antonio Jimeno Morenilla, David Mu\~noz-Hern\'andez, Higinio Mora Mora 2/24/2026

Spectral Phase Encoding for Quantum Kernel Methods

Analysis of quantum kernel methods under data corruption with introduction of Spectral Phase Encoding technique.

Ax Hyunwoo Park 2/24/2026

I Dropped a Neural Net

Method to recover the correct ordering of shuffled neural network layers using only the dataset and layer information.

Ax Zhongwei Wan, Yun Shen, Zhihao Dou, Donghao Zhou, Yu Zhang, Xin Wang, Hui Shen, Jing Xiong, Chaofan Tao, Zixuan Zhong, Peizhou Huang, Mi Zhang 2/24/2026

DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning

DSDR: dual-scale diversity regularization method to improve exploration in LLM reasoning tasks with reinforcement learning from verifiers.

Ax Ghaith Mqawass (TUM School of Life Sciences Weihenstephan, Technical University of Munich, Germany, Machine Learning and Computational Sciences, Pfizer Research & Development, Berlin, Germany), Tuan Le (Machine Learning and Computational Sciences, Pfizer Research & Development, Berlin, Germany), Fabian Theis (TUM School of Life Sciences Weihenstephan, Technical University of Munich, Germany, TUM School of Computation, Information and Technology, Technical University of Munich, Germany, Institute of Computational Biology, Helmholtz Center Munich, Germany), Djork-Arn\'e Clevert (Machine Learning and Computational Sciences, Pfizer Research & Development, Berlin, Germany) 2/24/2026

De novo molecular structure elucidation from mass spectra via flow matching

MSFlow: generative model using flow matching to perform de novo molecular structure elucidation from mass spectrometry data.