Ax Aidar Myrzakhan, Tianyi Li, Bowei Guo, Shengkun Tang, Zhiqiang Shen 2/20/2026

Sink-Aware Pruning for Diffusion Language Models

Pruning technique for diffusion language models reducing inference cost by reconsidering attention sink preservation.

Ax Murat Onur Yildirim, Elif Ceren Gok Yildirim, Joaquin Vanschoren 2/20/2026

Unlocking [CLS] Features for Continual Post-Training

Continual learning approach for foundation models addressing stability-plasticity trade-off during post-training on new classes/domains.

Ax Mark Lee, Chang Lan, Tom Gunter, John Peebles, Hanzhi Zhou, Kelvin Zou, Sneha Bangalore, Chung-Cheng Chiu, Nan Du, Xianzhi Du, Philipp Dufter, Ruixuan Hou, Haoshuo Huang, Dongseong Hwang, Xiang Kong, Jinhao Lei, Tao Lei, Meng Li, Li Li, Jiarui Lu, Zhiyun Lu, Yiping Ma, David Qiu, Vivek Rathod, Senyu Tong, Zhucheng Tu, Jianyu Wang, Yongqiang Wang, Zirui Wang, Floris Weers, Sam Wiseman, Guoli Yin, Bowen Zhang, Xiyou Zhou, Danyang Zhuo, Cheng Leong, Ruoming Pang 2/20/2026

AXLearn: Modular, Hardware-Agnostic Large Model Training

AXLearn production system for scalable hardware-agnostic training of large models with modular software architecture.

Ax Thibaud Gloaguen, Robin Staab, Nikola Jovanovi\'c, Martin Vechev 2/20/2026

Watermarking Diffusion Language Models

First watermarking method for diffusion language models that generate tokens non-sequentially, addressing unique DLM challenges.

Ax Ruchi Sandilya, Sumaira Perez, Charles Lynch, Lindsay Victoria, Benjamin Zebley, Derrick Matthew Buchanan, Mahendra T. Bhati, Nolan Williams, Timothy J. Spellman, Faith M. Gunning, Conor Liston, Logan Grosenick 2/20/2026

Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation

Introduces ConDA, a contrastive learning layer for organizing diffusion model latent spaces to enable controllable generation.

Ax Arshia Soltani Moakhar, Tanapoom Laoaron, Faraz Ghahremani, Kiarash Banihashem, MohammadTaghi Hajiaghayi 2/20/2026

Active Learning for Decision Trees with Provable Guarantees

Theoretical analysis of active learning label complexity for decision trees with provable polylogarithmic guarantees.

Ax Yijun Ma, Zehong Wang, Weixiang Sun, Yanfang Ye 2/20/2026

Temporal Graph Pattern Machine

Temporal graph pattern machine for learning transferable representations in dynamic networks without restrictive assumptions.