Ax Jiaren Peng, Zeqin Li, Chang You, Yan Wang, Hanlin Sun, Xuan Tian, Shuqiao Zhang, Junyi Liu, Jianguo Zhao, Renyang Liu, Haoran Ou, Yuqiang Sun, Jiancheng Zhang, Yutong Jiao, Kunshu Song, Chao Zhang, Fan Shi, Hongda Sun, Rui Yan, Cheng Huang 25d ago

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

Systematic analysis and benchmark comparing LLM-based automated penetration testing frameworks for autonomous security testing.

Ax Tim Lukas Adam, Phongsakon Mark Konrad, Riccardo Terrenzi, Florian Girardo Lukas, Rahime Yilmaz, Krzysztof Sierszecki, Serkan Ayvaz 25d ago

CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models

CAKE benchmark with 188 expert-validated questions evaluating LLMs' understanding of cloud-native software architecture across Bloom's taxonomy levels.

Ax Osama Orabi, Artur Zagitov, Hadi Salloum, Viktor A. Lobachev, Kasymkhan Khubiev, Yaroslav Kholodov 25d ago

Neural Network Pruning via QUBO Optimization

Neural network pruning formulated as QUBO optimization problem with principled objective formulations capturing filter interactions.

Ax David Picard, Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Davide Allegro, Tom Ravaud, Yohann Perron, Corentin Sautier, Zeynep Sonat Baltaci, Fei Meng, Syrine Kalleli, Marta L\'opez-Rauhut, Thibaut Loiseau, S\'egol\`ene Albouy, Raphael Baena, Elliot Vincent, Loic Landrieu 25d ago

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

Research paper introducing Polynomial Mixer (PoM), a linear-time token mixing mechanism replacing self-attention in transformers with preserved universality.

Ax Yanis Labrak, David Gr\"unert, S\'everin Baroudi, Jiyun Chun, Pawel Cyrta, Sergio Burdisso, Ahmed Hassoon, David Liu, Adam Rothschild, Reed Van Deusen, Petr Motlicek, Andrew Perrault, Ricard Marxer, Thomas Schaaf 25d ago

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

Synthetic pipeline generates doctor-patient conversations for training and evaluating long-form audio summarization models.

Ax Zhengming Yu, Li Ma, Mingming He, Leo Isikdogan, Yuancheng Xu, Dmitriy Smirnov, Pablo Salamanca, Dao Mi, Pablo Delgado, Ning Yu, Julien Philip, Xin Li, Wenping Wang, Paul Debevec 25d ago

DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models

Diffusion model approach for converting low dynamic range video to HDR through scene radiance estimation.

Ax Guhao Feng, Shengjie Luo, Kai Hua, Ge Zhang, Di He, Wenhao Huang, Tianle Cai 25d ago

In-Place Test-Time Training

Test-time training method updates LLM fast weights at inference to adapt dynamically to new information streams.

Ax Alaa Saleh, Sasu Tarkoma, Praveen Kumar Donta, Anders Lindgren, Naser Hossein Motlagh, Schahram Dustdar, Susanna Pirttikangas, Lauri Lov\'en 25d ago

UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces

UserCentrix is a hybrid agentic orchestration framework for smart spaces combining memory augmentation with multi-agent coordination.

Ax Tianyu Liu, Simeng Han, Hanchen Wang, Xiao Luo, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Yinsheng Lu, Xinyu Wei, Qinzhe Xing, Antonia Panescu, Mengbo Wang, Vibha Annaswamy, Alicia Sanchez, Jack Cloherty, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao 25d ago

Advancing AI Research Assistants with Expert-Involved Learning

ARIEL framework pairs expert-vetted biomedical tasks with LLMs for evaluation and optimization of AI research assistants.

Ax Bohan Tang, Dezhao Luo, Jianheng Liu, Jingxuan Chen, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao 25d ago

Beyond Syntax: Action Semantics Learning for App Agents

Fine-tunes open-source LLMs for smartphone app control by learning action semantics rather than syntax, reducing API costs.

Ax Michael Grosskopf, Nathan Debardeleben, Russell Bent, Rahul Somasundaram, Isaac Michaud, Arthur Lui, Alexius Wadell, Warren D. Graham, Golo A Wimmer, Sachin Shivakumar, Joan Vendrell Gallart, Harsha Nagarajan, Earl Lawrence 25d ago

URSA: The Universal Research and Scientific Agent

URSA framework enables LLMs to conduct autonomous research through complex reasoning, planning, coding, and multi-agent collaboration.

Ax Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, C\'ian Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Mercy Asiedu, Ines Mezerreg, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ram\'e, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, L\'eonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang 25d ago

MedGemma Technical Report

MedGemma is a medical vision-language foundation model collection designed for healthcare AI tasks with privacy preservation.

Ax Fang Wu, Xu Huang, Weihao Xuan, Zhiwei Zhang, Yijia Xiao, Guancheng Wan, Xiaomin Li, Bing Hu, Peng Xia, Jure Leskovec, Yejin Choi 25d ago

Multiplayer Nash Preference Optimization

Extends Nash learning from human feedback to multiplayer setting, addressing non-transitive and heterogeneous preference capture in LLM alignment.