Ax Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan 9d ago

Continuous Adversarial Flow Models

Continuous-time flow models trained with adversarial objectives using learned discriminators instead of fixed MSE criteria.

Ax Denizalp Goktas, Gerardo Ria\~no-Brice\~no, Alif Abdullah, Aryan Nair, Chenkai Shen, Beatriz de Lucio, Alexandra Magnusson, Farhan Mashrur, Ahmed Abdulla, Shawrna Sen, Mahitha Thippireddy, Gregory Schwartz, Amy Greenwald 9d ago

TempusBench: An Evaluation Framework for Time-Series Forecasting

Evaluation framework and benchmark for assessing time-series foundation models and forecasting approaches.

Ax Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi 9d ago

Towards Autonomous Mechanistic Reasoning in Virtual Cells

Framework for autonomous mechanistic reasoning in virtual cells using LLMs, representing biological reasoning as mechanistic action graphs.

Ax Nicolas Rodriguez-Alvarez (Instituto de Educacion Secundaria Parquesol, Valladolid, Spain), Fernando Rodriguez-Merino (University of Valladolid, Valladolid, Spain) 9d ago

Fairness is Not Flat: Geometric Phase Transitions Against Shortcut Learning

Methodology to mitigate shortcut learning and demographic bias in deep neural networks using geometric a priori approaches.

Ax J. Oppliger, M. Stifter, A. R\"uegg, I. Bia{\l}o, L. Martinelli, P. G. Freeman, D. Prabhakaran, J. Zhao, Q. Wang, J. Chang 9d ago

Autonomous Diffractometry Enabled by Visual Reinforcement Learning

Model-free reinforcement learning system for autonomous crystal alignment using visual information without domain knowledge of crystallography.

Ax Hugh Blayney, \'Alvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong 9d ago

A Mechanistic Analysis of Looped Reasoning Language Models

Mechanistic analysis of looped reasoning language models examining internal dynamics and latent state evolution compared to standard feedforward models.

Ax Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak 9d ago

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

Uses reinforcement learning on physics simulators to train models solving Physics Olympiad problems, addressing lack of large-scale physics QA datasets for reasoning models.

Ax Jon M Laurent, Albert Bou, Michael Pieler, Conor Igoe, Alex Andonian, Siddharth Narayanan, James Braza, Alexandros Sanchez Vassopoulos, Jacob L Steenwyk, Blake Lash, Andrew D White, Samuel G Rodriques 9d ago

LABBench2: An Improved Benchmark for AI Systems Performing Biology Research

LABBench2: Improved benchmark for evaluating AI systems and agents on biology research tasks with real-world capabilities.

Ax Magda Dubois, Ekin Zorer, Maia Hamin, Joe Skinner, Alexandra Souly, Jerome Wynne, Harry Coppock, Lucas Satos, Sayash Kapoor, Sunischal Dev, Keno Juchems, Kimberly Mai, Timo Flesch, Lennart Luettgau, Charles Teague, Eric Patey, JJ Allaire, Lorenzo Pacchiardi, Jose Hernandez-Orallo, Cozmin Ududec 9d ago

Seven simple steps for log analysis in AI systems

Pipeline and best practices for log analysis in AI systems to understand model behaviors, with code examples in Inspect framework.

Ax Yaniv Leviathan (Cheenu), Dani Valevski (Cheenu), Matan Kalman (Cheenu), Danny Lumen (Cheenu), Eyal Segalis (Cheenu), Eyal Molad (Cheenu), Shlomi Pasternak (Cheenu), Vishnu Natchu (Cheenu), Valerie Nygaard (Cheenu), Srinivasan (Cheenu), Venkatachary, James Manyika, Yossi Matias 9d ago

Generative UI: LLMs are Effective UI Generators

Demonstrating LLMs can generate UI interfaces and content together with proper prompting and tool integration.

Ax Jash Vira, Ashley Harris 9d ago

Spatial Competence Benchmark

Spatial Competence Benchmark (SCBench) evaluating large models on spatial reasoning, environment representation, and planning tasks.

Ax Justin Li, Daniel Ding, Asmita Yuki Pritha, Aryana Hou, Xin Wang, Shu Hu 9d ago

Robust Fair Disease Diagnosis in CT Images

Deep learning approach for fair disease diagnosis in chest CT addressing compound failures from class imbalance and demographic underrepresentation.

Ax Kyle Waters, Lucas Nuzzi, Tadhg Looram, Alessandro Tomasiello, Ariel Ghislain Kemogne Kamdoum, Bikun Li, Damien Sileo, Egor Kretov, Francesco Fournier-Facio, Georgios Soloupis, Haile Kassahun, Hew Wolff, Jiaqi Cai, Lianghui Li, Marc Roth, Mohinder Naiya, Naixu Guo, Qicheng Tang, Richard Wheeler, Samuele Sala, Serguei Popov, Steven Dillman, Yuqi Li 9d ago

COMPOSITE-Stem

COMPOSITE-STEM benchmark with 70 expert-written tasks for evaluating AI agents on physics, biology, chemistry, and materials science problems.

Ax Aayush Mishra, Daniel Khashabi, Anqi Liu 9d ago

Steered LLM Activations are Non-Surjective

Research on activation steering in LLMs showing steered states are non-surjective, with implications for interpretability and safety.

Ax Vasilis Kontonis, Yuchen Zeng, Shivam Garg, Lingjiao Chen, Hao Tang, Ziyan Wang, Ahmed Awadallah, Eric Horvitz, John Langford, Dimitris Papailiopoulos 9d ago

MEMENTO: Teaching LLMs to Manage Their Own Context

MEMENTO teaches LLMs to compress reasoning into dense summaries, reducing context and compute requirements. Releases OpenMementos dataset of 228K examples.