Ax James O'Neill, Robert Clancy, Mariia Matskevichus, Fergal Reid 24d ago

Low-Rank Key Value Attention

Low-Rank Key-Value (LRKV) attention reduces transformer KV cache memory by exploiting redundancy across attention heads with low-rank residuals.

Ax Alex Morehead, Miruna Cretu, Antonia Panescu, Rishabh Anand, Maurice Weiler, Tynan Perez, Samuel Blau, Steven Farrell, Wahid Bhimji, Anubhav Jain, Hrushikesh Sahasrabuddhe, Pietro Lio, Tommi Jaakkola, Rafael Gomez-Bombarelli, Rex Ying, N. Benjamin Erichson, Michael W. Mahoney 24d ago

Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials

Open-source foundation model for 3D molecular and materials modeling with both generative and predictive capabilities.

Ax Chenxu Yang, Chuanyu Qin, Qingyi Si, Minghui Chen, Naibin Gu, Dingyu Yao, Zheng Lin, Weiping Wang, Jiaqi Wang, Nan Duan 24d ago

Self-Distilled RLVR

On-policy self-distillation approach for LLM training combining dense teacher signals with sparse verifiable rewards from environment feedback.

Ax Tijana Zrnic, Emmanuel J. Cand\`es 24d ago

Active Statistical Inference

Active inference methodology for ML-assisted data collection, using models to identify which points merit labeling under budget constraints for efficient learning.

Ax Achraf Azize, Marc Jourdan, Aymen Al Marjani, Debabrota Basu 24d ago

Differentially Private Best-Arm Identification

Studies best-arm identification with differential privacy guarantees in local and central models for privacy-sensitive applications like clinical trials and hyperparameter tuning.

Ax Timo Gierlich, Andreas Baumbach, Akos F. Kungl, Kevin Max, Mihai A. Petrovici 24d ago

Spike-based alignment learning solves the weight transport problem

Spike-based alignment learning resolves weight transport problem in neural networks, enabling local computation compatible with biological networks and neuromorphic hardware.

Ax Andrea Montanari, Viet Vu 24d ago

Computational bottlenecks for denoising diffusions

Analyzes computational bottlenecks in denoising diffusion models, examining efficiency of drift learning and sampling procedures for probability distribution approximation.