Ax Pavel Golikov, Evgenii Opryshko, Gennady Pekhimenko, Mark C. Jeffrey 2d ago

Robust Reasoning Benchmark

Benchmark evaluating robustness of LLM reasoning with 14 perturbation techniques applied to mathematical reasoning tasks.

Ax Maxim Ostroukhov, Ruslan Mikhailov, Vladimir Iashin, Artem Sokolov, Andrei Akshonov, Vitaly Protasov, Dmitrii Beloborodov, Vince Mullin, Roman Yokunda Enzmann, Georgios Kolovos, Jason Renders, Pavel Nesterov, Anton Repushko 2d ago

PRAGMA: Revolut Foundation Model

PRAGMA: foundation models for banking event sequences. Transformer-based architecture with self-supervised pretraining on financial transaction data.

Ax Fengwei Teng, Jinyi Bai, Xinhao Yao, Demi Ruohan Wang, Jiahao Zhao, Zhijiang Guo 2d ago

Skip-Connected Policy Optimization for Implicit Advantage

Skip-Connected Policy Optimization (SKPO) for reinforcement learning with reasoning tasks. Improves upon GRPO by addressing high-variance advantage estimation.

Ax Nan Huang, Xiaoxiao Zhou, Junxia Cui, Mario Tapia-Pacheco, Tiffany Amariuta, Yang Li, Jingbo Shang 2d ago

EvoLen: Evolution-Guided Tokenization for DNA Language Model

EvoLen: evolution-guided tokenization approach for DNA language models. Addresses fundamental tokenization design challenges in biological sequence modeling.

Ax Charles Arnal, Vivien Cabannes, Taco Cohen, Julia Kempe, Remi Munos 2d ago

Efficient RL Training for LLMs with Experience Replay

Experience replay for LLM post-training RL formalizing optimal buffer design as trade-off between sample efficiency and data freshness.

Ax Zhaolin Gao (Sid), Yu (Sid), Wang, Bo Liu, Thorsten Joachims, Kiant\'e Brantley, Wen Sun 2d ago

$p1$: Better Prompt Optimization with Fewer Prompts

Prompt optimization method decomposing reward variance into response and prompt variance to identify task amenability to optimization.

Ax Mehran Taghian, Yunke Peng, Xing Huang, Yao Wang, Yaoyuan Wang, Wei Guo, Yuanyong Luo, Tianchi Hu, Junsong Wang, Xin Wang, Hu Liu, Yu Cheng, Ziwei Yu, Hongliang Li, Mehdi Rahimifar, Lei Yan, Xuefei Wang, Zhuang Ma, Lei Liu, Hui Yu, Anandharaju Durai Raju, Hoang Le, Hei Yi Mak, Tanzila Rahman, Shadan Golestan 2d ago

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

4-bit floating-point format (HiFloat4) for efficient language model pre-training on Ascend NPU hardware.

Ax Chia-Hong Hsu, Frank Wood 2d ago

Discrete Meanflow Training Curriculum

Training curriculum method for discrete flow-based image generation models to improve one-step sampling stability and quality.

Ax Amrut Nadgir, Vijay Balasubramanian, Pratik Chaudhari 2d ago

How does Chain of Thought decompose complex tasks?

Demonstrates power-law scaling of classification error with number of classes and how chain-of-thought decomposition reduces error through task splitting.