Isolater - Feed

Ax Florent Delgrange 3/2/2026

Foundation World Models for Agents that Learn, Verify, and Adapt Reliably Beyond Static Environments

Vision for foundation world models enabling autonomous agents to learn, verify, and adapt reliably in open, non-static environments.

Ax Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu 3/2/2026

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Multi-agent workflow system that translates jailbreak papers into executable modules for unified benchmarking of LLM robustness techniques.

Ax Na Min An, Yoonna Jang, Yusuke Hirota, Ryo Hachiuma, Isabelle Augenstein, Hyunjung Shim 3/2/2026

Interpretable Debiasing of Vision-Language Models for Social Fairness

Model-agnostic interpretable debiasing method for vision-language models to mitigate unintended social bias in black-box reasoning processes.

Ax Daniel Yang, Samuel Stante, Florian Redhardt, Lena Libon, Parnian Kassraie, Ido Hakimi, Barna P\'asztor, Andreas Krause 3/2/2026

RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models

RewardUQ: Framework for uncertainty quantification in reward models used to align LLMs with human preferences, reducing annotation costs.

Ax Ferran Agullo, Joan Oliveras, Chen Wang, Alberto Gutierrez-Torre, Olivier Tardieu, Alaa Youssef, Jordi Torres, Josep Ll. Berral 3/2/2026

Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving

Data-driven optimization pipeline for scheduling and caching hundreds of LLM adapters in distributed serving to maximize GPU throughput.

Ax Chenwei Jia, Baoting Li, Xuchong Zhang, Mingzhuo Wei, Bochen Lin, Hongbin Sun 3/2/2026

Quant Experts: Token-aware Adaptive Error Reconstruction with Mixture of Experts for Large Vision-Language Models Quantization

Quant Experts: Post-training quantization method for vision-language models using mixture of experts for token-aware adaptive error reconstruction.

Ax Donghao Huang, Zhaoxia Wang 3/2/2026

Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis

Empirical study evaluating whether reasoning capabilities universally improve LLM performance across sentiment analysis tasks of varying complexity.

Ax Viet Bac Nguyen, Phuong Thai Nguyen 3/2/2026

Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning

ACWI framework for dynamically balancing intrinsic and extrinsic rewards in sparse reward reinforcement learning through adaptive scaling.

Ax Jaekyung Cho 3/2/2026

Preference Packing: Efficient Preference Optimization for Large Language Models

Preference packing technique for efficient batch training of LLMs during preference optimization (RLHF), improving resource utilization.

Ax Yuxuan Zhang, Katar\'ina T\'othov\'a, Zian Wang, Kangxue Yin, Haithem Turki, Riccardo de Lutio, Yen-Yu Chang, Or Litany, Sanja Fidler, Zan Gojcic 3/2/2026

DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer

DiffusionHarmonizer method combining neural reconstruction with diffusion models to enhance photorealism in robotic simulation environments.

Ax Sara Nabhani, Federico Pianzola, Khalid Al-Khatib, Malvina Nissim 3/2/2026

ARGUS: Seeing the Influence of Narrative Features on Persuasion in Argumentative Texts

ARGUS framework studying how narrative features in argumentative texts influence persuasion using corpus analysis and modeling.

Ax Vikash Singh, Debargha Ganguly, Haotian Yu, Chengwei Zhou, Prerna Singh, Brandon Lee, Vipin Chaudhary, Gourav Datta 3/2/2026

Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification

Formal verification approach for vision-language models drafting radiology reports to ensure logical consistency in clinical reasoning.

Ax James L. Zainaldin, Cameron Pattison, Manuela Marai, Jacob Wu, Mark J. Schiefsky 3/2/2026

Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek

Systematic evaluation of LLM machine translation for Ancient Greek technical prose, showing terminology rarity predicts translation failures.

Ax Omar Mohamed, Edoardo Fazzari, Ayah Al-Naji, Hamdan Alhadhrami, Khalfan Hableel, Saif Alkindi, Cesare Stefanini 3/2/2026

Multimodal Optimal Transport for Unsupervised Temporal Segmentation in Surgical Robotics

Unsupervised temporal segmentation method for surgical video analysis using optimal transport, questioning necessity of large-scale pre-training.

Ax Yuxuan Liu, Weikai Xu, Kun Huang, Changyu Chen, Jiankun Zhao, Pengzhi Gao, Wei Liu, Jian Luan, Shuo Shang, Bo Du, Ji-Rong Wen, Rui Yan 3/2/2026

CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

CoME: Mobile agent architecture with four expert modules for hybrid reasoning including screen understanding, planning, and action execution.

Ax Adam Dejl, Deniz Gorur, Francesca Toni 3/2/2026

ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models

ArgLLM-App: Interactive web system implementing argumentative reasoning agents with LLMs for explainable binary decision-making tasks.

Ax Dor Tsur, Sharon Adar, Ran Levy 3/2/2026

Task-Centric Acceleration of Small-Language Models

TASC framework for accelerating small language models through task-adaptive sequence compression and vocabulary enrichment during fine-tuning.

Ax Rishabh Kabra, Maks Ovsjanikov, Drew A. Hudson, Ye Xia, Skanda Koppula, Andre Araujo, Joao Carreira, Niloy J. Mitra 3/2/2026

A Mixed Diet Makes DINO An Omnivorous Vision Encoder

Pre-training method for vision encoders (DINO) to improve cross-modal feature alignment between RGB images and depth maps across different modalities.

Ax Kush Grover, Markel Zubia, Debraj Chakraborty, Muqsit Azeem, Nils Jansen, Jan Kretinsky 3/2/2026

Resilient Strategies for Stochastic Systems: How Much Does It Take to Break a Winning Strategy?

Study of resilient decision-making strategies for agents under uncertainty and disturbances that could disrupt intended actions.

Ax Mohsen Tajgardan, Atena Shiranzaei, Mahdi Rabbani, Reza Khoshkangini, Mahtab Jamali 3/2/2026

An Efficient Unsupervised Federated Learning Approach for Anomaly Detection in Heterogeneous IoT Networks

Federated learning approach for anomaly detection across heterogeneous IoT devices while preserving privacy without centralized data collection.

Ax Haritz Puerto, Haonan Li, Xudong Han, Timothy Baldwin, Iryna Gurevych 3/2/2026

Controllable Reasoning Models Are Private Thinkers

Method for training reasoning models to follow instructions in reasoning traces to prevent unintended leakage of private information in AI agents processing sensitive user data.

Ax Jialiang Fan, Weizhe Xu, Mengyu Liu, Oleg Sokolsky, Insup Lee, Fangxin Kong 3/2/2026

SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems

SafeGen-LLM framework enhances safety in robotic task planning by combining LLMs with safety constraints, addressing generalization challenges in classical and RL-based planning methods.

Ax Kriti Thakur, Alivelu Manga Parimi, Mayukha Pal 3/2/2026

FaultXformer: A Transformer-Encoder Based Fault Classification and Location Identification model in PMU-Integrated Active Electrical Distribution System

FaultXformer Transformer-based model for fault detection and localization in electrical distribution systems using PMU data.

Ax Dake Zhang, Mark D. Smucker, Charles L. A. Clarke 3/2/2026

Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment

TREC 2025 DRAGUN track resources for evaluating RAG systems that help readers assess news trustworthiness with attributed reports.

Ax Ali Behrouz, Zeman Li, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni 3/2/2026

Memory Caching: RNNs with Growing Memory

Exploration of recurrent architectures with growing memory as subquadratic alternatives to Transformers for sequence modeling.

Ax Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan 3/2/2026

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

LoRA-Pre method reducing memory overhead in optimizers like Adam via low-rank approximation of momentum states.

Ax Weinan Dai, Hanlin Wu, Qiying Yu, Huan-ang Gao, Jiahao Li, Chengquan Jiang, Weiqiang Lou, Yufan Song, Hongli Yu, Jiaze Chen, Wei-Ying Ma, Ya-Qin Zhang, Jingjing Liu, Mingxuan Wang, Xin Liu, Hao Zhou 3/2/2026

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDA Agent system using large-scale agentic RL to generate optimized GPU kernels, bridging gap between LLMs and compiler-based systems.

Ax Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas 3/2/2026

Do LLMs Benefit From Their Own Words?

Study comparing standard multi-turn prompting with user-turn-only prompting to determine if LLMs benefit from their own prior responses.

Ax Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang 3/2/2026

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Research on offline-to-online multi-agent reinforcement learning with offline value function memory and sequential exploration strategies.

Ax Faria Huq, Zora Zhiruo Wang, Frank F. Xu, Tianyue Ou, Shuyan Zhou, Jeffrey P. Bigham, Graham Neubig 3/2/2026

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

CowPilot framework enabling autonomous and human-agent collaborative web navigation with preference modeling and human oversight.

Ax Dawei Cheng, Wenjun Wang, Mingjian Guang 3/2/2026

Language Models as Messengers: Enhancing Message Passing in Heterophilic Graph Learning

Method using language models to improve message passing in heterophilic graph neural networks by leveraging semantic node text.

Ax Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang 3/2/2026

CoMind: Towards Community-Driven Agents for Machine Learning Engineering

MLE-Live framework for evaluating LLM agents in ML engineering that engage with research communities through knowledge sharing and communication.

Ax Patrick Taillandier, Jean Daniel Zucker, Arnaud Grignard, Benoit Gaudou, Nghi Quang Huynh, Alexis Drogoul 3/2/2026

Integrating LLM in Agent-Based Social Simulation: Opportunities and Challenges

Position paper examining opportunities and limitations of integrating LLMs into agent-based social simulations from computational social science perspective.

Ax Wenliang Li, Rui Yan, Xu Zhang, Li Chen, Hongji Zhu, Jing Zhao, Junjun Li, Mengru Li, Wei Cao, Zihang Jiang, Wei Wei, Kun Zhang, Shaohua Kevin Zhou 3/2/2026

MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM

Multi-agent system for clinical diagnosis that accumulates self-learned clinical knowledge across agent interactions for improved LLM performance.

Ax Xuyan Ma, Xiaofei Xie, Yawen Wang, Junjie Wang, Boyu Wu, Mingyang Li, Qing Wang 3/2/2026

Demystifying the Lifecycle of Failures in Platform-Orchestrated Agentic Workflows

Analysis of failure modes in multi-agent workflows built on low-code orchestration platforms, examining propagation across heterogeneous nodes.

Ax Xiaoyang Cao, Zelai Xu, Mo Guang, Kaiwen Long, Michiel A. Bakker, Yu Wang, Chao Yu 3/2/2026

RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment

RE-PO framework for robust LLM alignment that handles noisy preference data and unreliable annotations in RLHF-style training.

Ax Jiaxi Li, Yucheng Shi, Xiao Huang, Jin Lu, Ninghao Liu 3/2/2026

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

MITS algorithm using pointwise mutual information to improve tree search reasoning in LLMs with better step quality assessment.

Ax Haolang Lu, Bolun Chu, WeiYe Fu, Guoshun Nan, Junning Liu, Minghui Pan, Qiankun Li, Yi Yu, Hua Wang, Kun Wang 3/2/2026

Reallocating Attention Across Layers to Reduce Multimodal Hallucination

Method for reducing hallucinations in multimodal LLMs by reallocating attention across layers to balance perception and reasoning.

Ax Tanmay Ambadkar, {\DJ}or{\dj}e \v{Z}ikeli\'c, Abhinav Verma 3/2/2026

Automating the Refinement of Reinforcement Learning Specifications

AutoSpec framework for automatically refining logical specifications in reinforcement learning through exploration-guided search strategies.

Ax Yongrui Yu, Zhongzhen Huang, Linjie Mu, Shaoting Zhang, Xiaofan Zhang 3/2/2026

Radiologist Copilot: An Agentic Framework Orchestrating Specialized Tools for Reliable Radiology Reporting

Agentic framework orchestrating specialized tools for automated radiology reporting, combining vision-language models with multi-step reasoning.

Ax Dawei Li, Abdullah Alnaibari, Arslan Bisharat, Manny Sandoval, Deborah Hall, Yasin Silva, Huan Liu 3/2/2026

From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?

Research on whether LLMs can mediate online conflicts by fostering empathy and constructive dialogue beyond content moderation.

Ax Kuai Yu, Naicheng Yu, Han Wang, Rui Yang, Huan Zhang 3/2/2026

How do Visual Attributes Influence Web Agents? A Comprehensive Evaluation of User Interface Design Factors

Evaluation of visual UI design factors influencing web agent decision-making and task performance.

Ax Zixuan Huang, Xin Xia, Yuxi Ren, Jianbin Zheng, Xuefeng Xiao, Hongyan Xie, Li Huaqiu, Songshi Liang, Zhongxiang Dai, Fuzhen Zhuang, Jianxin Li, Yikun Ban, Deqing Wang 3/2/2026

Real-Time Aligned Reward Model beyond Semantics

Real-time alignment technique for RLHF reward models to prevent overoptimization and maintain human intent capture.

Ax Zixuan Huang, Xin Xia, Yuxi Ren, Jianbin Zheng, Xuanda Wang, Zhixia Zhang, Hongyan Xie, Songshi Liang, Zehao Chen, Xuefeng Xiao, Fuzhen Zhuang, Jianxin Li, Yikun Ban, Deqing Wang 3/2/2026

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Study on whether Large Reasoning Models know when to stop thinking, addressing redundancy in long chains-of-thought.

Ax Zewei Yu, Lirong Gao, Yuke Zhu, Bo Zheng, Junbo Zhao, Sheng Guo, Haobo Wang 3/2/2026

Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty

Training method for Large Reasoning Models using adaptive reflection and length penalties to reduce unnecessary token consumption.

Ax Haibo Tong, Feifei Zhao, Linghao Feng, Ruoyu Wu, Ruolin Chen, Lu Jia, Zhou Zhao, Jindong Li, Tenglong Li, Erliang Lin, Shuai Yang, Enmeng Lu, Yinqian Sun, Qian Zhang, Zizhe Ruan, Jinyu Fan, Zeyang Yue, Ping Wu, Huangrui Li, Chengyi Sun, Yi Zeng 3/2/2026

ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI

ForesightSafety Bench evaluates frontier risks in autonomous AI with unpredictable and difficult-to-control behaviors.

Ax Seoyoung Lee, Seobin Yoon, Seongbeen Lee, Yoojung Chun, Dayoung Park, Doyeon Kim, Joo Yong Sim 3/2/2026

IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents

IntentCUA framework for computer-use agents with intent-aligned planning and multi-agent coordination over long horizons.

Ax Tao Zhe, Haoyu Wang, Bo Luo, Min Wu, Wei Fan, Xiao Luo, Zijun Yao, Haifeng Chen, Dongjie Wang 3/2/2026

Robust and Efficient Tool Orchestration via Layered Execution Structures with Reflective Correction

Layered execution structures for tool orchestration in agentic systems with reflective error correction mechanisms.

Ax Tony Feng, Junehyuk Jung, Sang-hyun Kim, Carlo Pagano, Sergei Gukov, Chiang-Chiang Tsai, David Woodruff, Adel Javanmard, Aryan Mokhtari, Dawsen Hwang, Yuri Chervonyi, Jonathan N. Lee, Garrett Bingham, Trieu H. Trinh, Vahab Mirrokni, Quoc V. Le, Thang Luong 3/2/2026

Aletheia tackles FirstProof autonomously

Aletheia AI agent solved 6/10 FirstProof mathematics challenges autonomously using Gemini 3 Deep Think reasoning.

Ax Abeer Dyoub, Francesca A. Lisi 3/2/2026

fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

Risk-based fuzzy ethical decision-making framework with principle-level explainability and pluralistic validation.