Ax Yang Liu, Enxi Wang, Yufei Gao, Weixin Zhang, Bo Wang, Zhiyuan Zeng, Yikai Zhang, Yining Zheng, Xipeng Qiu 9d ago

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Memory-Enhanced Dynamic reward Shaping (MEDS) framework for reinforcement learning that reduces failure pattern recurrence in LLM training.

Ax Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan 9d ago

Continuous Adversarial Flow Models

Continuous-time flow models trained with adversarial objectives using learned discriminators instead of fixed MSE criteria.

Ax Denizalp Goktas, Gerardo Ria\~no-Brice\~no, Alif Abdullah, Aryan Nair, Chenkai Shen, Beatriz de Lucio, Alexandra Magnusson, Farhan Mashrur, Ahmed Abdulla, Shawrna Sen, Mahitha Thippireddy, Gregory Schwartz, Amy Greenwald 9d ago

TempusBench: An Evaluation Framework for Time-Series Forecasting

Evaluation framework and benchmark for assessing time-series foundation models and forecasting approaches.

Ax Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi 9d ago

Towards Autonomous Mechanistic Reasoning in Virtual Cells

Framework for autonomous mechanistic reasoning in virtual cells using LLMs, representing biological reasoning as mechanistic action graphs.

Ax Nicolas Rodriguez-Alvarez (Instituto de Educacion Secundaria Parquesol, Valladolid, Spain), Fernando Rodriguez-Merino (University of Valladolid, Valladolid, Spain) 9d ago

Fairness is Not Flat: Geometric Phase Transitions Against Shortcut Learning

Methodology to mitigate shortcut learning and demographic bias in deep neural networks using geometric a priori approaches.

Ax J. Oppliger, M. Stifter, A. R\"uegg, I. Bia{\l}o, L. Martinelli, P. G. Freeman, D. Prabhakaran, J. Zhao, Q. Wang, J. Chang 9d ago

Autonomous Diffractometry Enabled by Visual Reinforcement Learning

Model-free reinforcement learning system for autonomous crystal alignment using visual information without domain knowledge of crystallography.

Ax Hugh Blayney, \'Alvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong 9d ago

A Mechanistic Analysis of Looped Reasoning Language Models

Mechanistic analysis of looped reasoning language models examining internal dynamics and latent state evolution compared to standard feedforward models.

Ax Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak 9d ago

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

Uses reinforcement learning on physics simulators to train models solving Physics Olympiad problems, addressing lack of large-scale physics QA datasets for reasoning models.

Ax Jon M Laurent, Albert Bou, Michael Pieler, Conor Igoe, Alex Andonian, Siddharth Narayanan, James Braza, Alexandros Sanchez Vassopoulos, Jacob L Steenwyk, Blake Lash, Andrew D White, Samuel G Rodriques 9d ago

LABBench2: An Improved Benchmark for AI Systems Performing Biology Research

LABBench2: Improved benchmark for evaluating AI systems and agents on biology research tasks with real-world capabilities.

Ax Magda Dubois, Ekin Zorer, Maia Hamin, Joe Skinner, Alexandra Souly, Jerome Wynne, Harry Coppock, Lucas Satos, Sayash Kapoor, Sunischal Dev, Keno Juchems, Kimberly Mai, Timo Flesch, Lennart Luettgau, Charles Teague, Eric Patey, JJ Allaire, Lorenzo Pacchiardi, Jose Hernandez-Orallo, Cozmin Ududec 9d ago

Seven simple steps for log analysis in AI systems

Pipeline and best practices for log analysis in AI systems to understand model behaviors, with code examples in Inspect framework.

Ax Yaniv Leviathan (Cheenu), Dani Valevski (Cheenu), Matan Kalman (Cheenu), Danny Lumen (Cheenu), Eyal Segalis (Cheenu), Eyal Molad (Cheenu), Shlomi Pasternak (Cheenu), Vishnu Natchu (Cheenu), Valerie Nygaard (Cheenu), Srinivasan (Cheenu), Venkatachary, James Manyika, Yossi Matias 9d ago

Generative UI: LLMs are Effective UI Generators

Demonstrating LLMs can generate UI interfaces and content together with proper prompting and tool integration.

Ax Jash Vira, Ashley Harris 9d ago

Spatial Competence Benchmark

Spatial Competence Benchmark (SCBench) evaluating large models on spatial reasoning, environment representation, and planning tasks.