Ax Angelika Romanou, Mark Ibrahim, Candace Ross, Chantal Shaib, Kerem Oktar, Samuel J. Bell, Anaelia Ovalle, Jesse Dodge, Antoine Bosselut, Koustuv Sinha, Adina Williams 27d ago

Brittlebench: Quantifying LLM robustness via prompt sensitivity

Brittlebench framework quantifying LLM robustness through prompt sensitivity evaluation beyond static benchmarks.

Ax Yufei Xu, Fanxu Meng, Fan Jiang, Yuxuan Wang, Ruijie Zhou, Zhaohui Wang, Jiexi Wu, Zhixin Pan, Xiaojuan Tang, Wenjie Pei, Tongxuan Liu, Di Yin, Xing Sun, Muhan Zhang 27d ago

HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

Hierarchical indexing method HISA optimizes sparse attention mechanisms in LLMs by reducing indexer bottlenecks in token selection.

Ax Qing Lyu, Jianxu Wang, Jeremy Hudson, Ge Wang, Chirstopher T. Whitlow 27d ago

MRI-to-CT synthesis using drifting models

Medical imaging technique using diffusion models to synthesize CT images from MRI for pelvic imaging without ionizing radiation.

Ax Aiman Al Masoud, Antony Anju, Marco Arazzi, Mert Cihangiroglu, Vignesh Kumar Kembu, Serena Nicolazzo, Antonino Nocera, Vinod P., Saraga Sakthidharan 27d ago

Security in LLM-as-a-Judge: A Comprehensive SoK

Systematization of knowledge on security and reliability risks in LLM-as-a-Judge paradigm. Documents vulnerabilities where judges become targets of adversarial manipulation.

Ax Gabriel U. Talasso, Meghdad Kurmanji, Allan M. de Souza, Nicholas D. Lane, Leandro A. Villas 27d ago

Task-Centric Personalized Federated Fine-Tuning of Language Models

Personalized federated learning approach for fine-tuning language models on heterogeneous tasks. Improves performance on diverse client tasks while maintaining privacy.

Ax Ken M. Nakanishi 27d ago

Screening Is Enough

Multiscreen attention mechanism for language models. Introduces absolute query-key relevance to reject irrelevant keys, addressing softmax attention limitations.

Ax Xiaofan Zhou, Huy Nguyen, Bo Yu, Chenxi Liu, Lu Cheng 27d ago

Adaptive Stopping for Multi-Turn LLM Reasoning

Adaptive stopping mechanism for multi-turn LLM reasoning. Determines optimal stopping points for agents using retrieval-augmented generation and ReAct-style interactions.

Ax Zirui Zhao, Jun Hao Liew, Yan Yang, Wenzhuo Yang, Ziyang Luo, Doyen Sahoo, Silvio Savarese, Junnan Li 27d ago

GPA: Learning GUI Process Automation from Demonstrations

Vision-based robotic process automation (RPA) using sequential Monte Carlo localization. Enables stable GUI automation from single demonstrations with improved robustness.

Ax Dun Yuan, Fuyuan Lyu, Ye Yuan, Weixu Zhang, Bowei He, Jiayi Geng, Linfeng Du, Zipeng Sun, Yankai Chen, Changjiang Han, Jikun Kang, Alex Chen, Haolun Wu, Xue Liu 27d ago

Beyond Message Passing: A Semantic View of Agent Communication Protocols

Framework for analyzing agent communication protocols across three layers: communication, syntactic, and semantic. Systematically studies 18 representative protocols for LLM systems.

Ax Xun Sun, Baiheng Xie, Li Huang, Qiang Gao 27d ago

Scaling DPPs for RAG: Density Meets Diversity

Method scaling determinantal point processes for RAG systems to improve diversity of retrieved context while maintaining relevance.

Ax Lin Wang, Junfeng Fang, Dan Zhang, Fei Shen, Xiang Wang, Tat-Seng Chua 27d ago

DRAFT: Task Decoupled Latent Reasoning for Agent Safety

Framework for monitoring safety of tool-using LLM agents through latent reasoning that decouples safety judgment into trainable stages.

Ax William Merrill, Yanhong Li, Tyler Romero, Anej Svete, Caia Costello, Pradeep Dasigi, Dirk Groeneveld, David Heineman, Bailey Kuehl, Nathan Lambert, Chuan Li, Kyle Lo, Saumya Malik, DJ Matusz, Benjamin Minixhofer, Jacob Morrison, Luca Soldaini, Finbarr Timbers, Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi, Ashish Sabharwal 27d ago

Olmo Hybrid: From Theory to Practice and Back

OLMo Hybrid: theoretical and empirical analysis of hybrid models combining linear RNNs and attention as alternatives to pure transformers with scaling benefits.

Ax David Sewell, Xingjian Li, Stepan Tretiakov, Krishna Kumar, David Fridovich-Keil 27d ago

Neural Operators for Multi-Task Control and Adaptation

Neural operator methods for multi-task optimal control problems, mapping task descriptions to control policies using permutation-invariant architectures.

Ax Wenjing Gong, Udbhav Srivastava, Yuchen Wang, Yuhao Jia, Qifan Wu, Weishan Bai, Yifan Yang, Xiao Huang, Xinyue Ye 27d ago

Earth Embeddings Reveal Diverse Urban Signals from Space

Benchmark of Earth embedding models (AlphaEarth, Prithvi, Clay) for neighborhood-scale urban monitoring from satellite imagery.