Isolater - Feed

Ax Binglin Ji, Anindya Sarkar, Hengchang Lu, Jens Sj\"olund, Yevgeniy Vorobeychik 7/2/2026

Sequentially-Controlled Interactive Multi-Particle Flow-Maps for Online Feedback-Driven Search

Proposes flow-maps for sequential feedback-driven exploration in generative models with unknown preferences.

Ax Brett Reynolds 7/2/2026

Adversarial Pragmatics for AI Safety Evaluation: A Benchmark for Instruction Conflict, Embedded Commands, and Policy Ambiguity

Benchmark for evaluating LLM safety across instruction conflicts, embedded commands, and policy ambiguity in agentic tasks.

Ax Zhuoxuan Zhang (Yang), Kangqi Ni (Yang), Yuhang Chen (Yang), Mingfu Liang (Yang), Xiaohan Wei (Yang), Yunchen Pu (Yang), Fei Tian (Yang), Chonglin Sun (Yang), Frank Shyu (Yang), Adam (Yang), Song, Sandeep Pandey, Luke Simon, Tianlong Chen, Xi Liu 7/2/2026

Diffusion-GR2: Diffusion Generative Reasoning Re-ranker

Uses diffusion language models instead of autoregressive decoding to speed up generative reasoning re-rankers for recommendation systems.

Ax Mehul Damani, Isha Puri, Idan Shenfeld, Jacob Andreas 7/2/2026

Right in the Right Way: LM Training with Verifiable Rewards and Human Demonstrations

arXiv research: RLVR paradigm for LM training balances verifiable rewards on objective tasks with human demonstrations for subjective attributes like style.

Ax Liyuan Zhu, Shengyu Huang, Amrita Mazumdar, Tianye Li, Zan Gojcic, Gordon Wetzstein, Iro Armeni, Shalini De Mello, Alex Trevithick 7/2/2026

World from Motion: Generative Dynamic Gaussian Reconstruction from Monocular Video

arXiv research: Method for generating dynamic 3D Gaussian representations from monocular video using video models with pixel-aligned conditioning.

Ax Jeffrey Fang, Keyi Shen, Anutam Srinivasan, Glen Chou 7/2/2026

GPU-Parallel Linearization Error Bounds for Real-Time Robust Optimal Control of Nonlinear and Neural Network Dynamics

GPU-parallel linearization error bounds for real-time robust optimal control of nonlinear and neural network dynamics.

Ax Shayan Talaei, Abhinav Chinta, Devvrit Khatri, Amin Karbasi, Azalia Mirhoseini, Amin Saberi 7/2/2026

Distill to Detect: Exposing Stealth Biases in LLMs through Cartridge Distillation

Cartridge distillation method for detecting stealth entity and viewpoint biases in language models that hide preferences on relevant topics.

Ax Zhi Chen, Zhensu Sun, Yuling Shi, David Lo, Lingxiao Jiang 7/2/2026

Are Performance-Optimization Benchmarks Reliably Measuring Coding Agents?

Analysis of coding agent benchmarks (GSO, SWE-Perf, SWE-fficiency) examining reliability and conflation of runtime instability with agent capability.

Ax Chenyang Ma, Yue Yang, Radu Corcodel, Siddarth Jain, Andrew Wu, Chiori Hori, Diego Romeres 7/2/2026

FurnitureVLA: Learning Long-Horizon Bimanual Furniture Assembly with Vision-Language-Action Model

FurnitureVLA study of bimanual furniture assembly using Vision-Language-Action models with simulation and VR teleoperation for data collection.

Ax Chih-Han Yang, Dai-Jie Wu, Yun-Ping Huang, Ping-Chun Hsieh, Kenneth Marino, Shao-Hua Sun 7/2/2026

Language-Critique Imitation Learning from Suboptimal Demonstrations

Imitation learning framework using natural language critiques instead of scalar signals to learn from suboptimal demonstrations, enabling explicit reasoning about failures.

Ax Ziyu Chen, Yilun Zhao, Arman Cohan 7/2/2026

Measuring the Gap Between Human and LLM Research Ideas

Large-scale evaluation framework measuring how LLM-generated research ideas diverge from human researcher ideas across feasibility and novelty dimensions.

Ax Shalaleh Rismani, Roel Dobbe, AJung Moon 7/2/2026

From Silos to Systems: Process-Oriented Hazard Analysis for AI Systems

Framework applying system safety principles from established fields to identify and mitigate hazards from AI system component interactions and development processes.

Ax Zishang Jiang, Jinyi Han, Tingyun Li, Xinyi Wang, Sihang Jiang, Jiaqing Liang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao 7/2/2026

Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs

Method combining expert guidance with reinforcement learning to improve both effectiveness and diversity of exploration in LLM reasoning enhancement.

Ax Andrea Cera Palatsi, Samuel Martin-Gutierrez, Ana S. Cardenal, Max Pellert 7/2/2026

Large language models replicate and predict human cooperation across experiments in game theory

Empirical study demonstrating LLMs replicate and predict human cooperation patterns across game theory experiments, validating LLMs as behavioral simulators.

Ax Ziqian Bi, Yinzhi Wang, Tianyang Wang, Junfeng Hao, Benji Peng, Xinyuan Song 7/2/2026

CoT-X: An Adaptive Framework for Cross-Model Chain-of-Thought Transfer and Optimization

Framework for compressing chain-of-thought reasoning across different LLM sizes and architectures through semantic segmentation and adaptive summarization.

Ax Fatima Jahara, Mark Dredze, Sharon Levy 7/2/2026

Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles

PRIME framework using logic grid puzzles to probe subtle social biases in LLM logical reasoning beyond overt bias suppression.

Ax Wayne Chi, Yixiong Fang, Arnav Yayavaram, Siddharth Yayavaram, Seth Karten, Qiuhong Anna Wei, Runkun Chen, Alexander Wang, Valerie Chen, Ameet Talwalkar, Chris Donahue 7/2/2026

GameDevBench: Evaluating Agentic Capabilities Through Game Development

GameDevBench evaluation testbed for multimodal coding agents combining complex codebase navigation with manipulation of visual game assets.

Ax Mei Chee Leong, Ying Gu, Hui Li Tan, Liyuan Li, Nancy Chen 7/2/2026

Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

Explicit Logic Channel for parallel logical reasoning validation of multimodal LLMs on zero-shot tasks to enhance interpretability.

Ax Guanyu Jiang, Zhaochen Su, Xiaoye Qu, Yi R. Fung 7/2/2026

XSkill: Continual Learning from Experience and Skills in Multimodal Agents

XSkill framework for continual learning in multimodal agents without parameter updates by extracting experiences and skills from past trajectories.

Ax Tianyu Xie, Jinfa Huang, Yuexiao Ma, Rongfang Luo, Yan Yang, Wang Chen, Yuhui Zeng, Yixuan Zou, Qingchuan Ma, Zhiqiang Lu, Ruize Fang, Xiawu Zheng, Jiebo Luo, Rongrong Ji 7/2/2026

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

SocialOmni benchmark for evaluating audio-visual social interactivity in omni-modal language models beyond static accuracy-centric tasks.

Ax Jiawen Wen, Penglei Sun, Wenjie Zhang, Suixuan Qiu, Weisheng Xu, Xiaofei Yang, Xiaowen Chu 7/2/2026

Rule-VLN: Bridging Perception and Compliance via Semantic Reasoning and Geometric Rectification

Rule-VLN benchmark for vision-language navigation emphasizing social compliance and semantic rules over pure geometric reachability.

Ax Xinyu Zhu, Yuzhu Cai, Zexi Liu, Cheng Wang, Fengyang Li, Wenkai Jin, Wanxu Liu, Zehao Bing, Bingyang Zheng, Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xianghe Pang, Yaxin Du, Tingjia Miao, Yuzhi Zhang, Ruoxue Liao, Zhaohan Ding, Linfeng Zhang, Yanfeng Wang, Weinan E, Siheng Chen 7/2/2026

EvoMaster: A Foundational Evolving Agent Framework for Agentic Science at Scale

EvoMaster foundational evolving agent framework for scientific discovery enabling iterative learning from trial and error at scale.

Ax Wanli Li, Bince Qu, Bo Pan, Jianyu Zhang, Zheng Liu, Pan Zhang, Wei Chen, Bo Zhang 7/2/2026

LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

LiteResearcher framework scaling reinforcement learning for deep research agents without hand-crafted synthetic data or real-world search instability.

Ax Munkhdelgerekh Batzorig, Purevbaatar Ganbold, Kyungbin Park, Pilkong Jeong, Kangbin Yim 7/2/2026

Mechanical Conscience: A Mathematical Framework for Dependability of Machine Intelligence

Mathematical framework for dependability in distributed collaborative intelligence addressing emergent risks in multi-agent systems under uncertainty.

Ax Ali \c{S}enol, Garima Agrawal, Huan Liu 7/2/2026

Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework

Multi-dimensional behavioral framework for measuring LLM reasoning quality beyond final-answer correctness across contextual variation and efficiency.

Ax Teddy Ferdinan, Bart{\l}omiej Koptyra, Miko{\l}aj Langner, Tomasz Adamczyk, {\L}ukasz Radli\'nski, Maciej Markiewicz, Aleksander Szcz\k{e}sny, Stanis{\l}aw Wo\'zniak, Tymoteusz Romanowicz, Dzmitry Pihulski, Mateusz Zbrocki, Mateusz \'Smigielski, Micha{\l} Rajkowski, Mateusz Biedka, Konrad Kie{\l}czy\'nski, Konrad Wojtasik, Jacek Duszenko, Jan Eliasz, Piotr Matys, Maria Bellaniar Ismiati, Latius Hermawan, Wiktoria Mieleszczenko-Kowszewicz, Anna Kubicka-Sowi\'nska, Grzegorz Chodak, Karol Postawa, Pawe{\l} Zyblewski, Tomasz Szanda{\l}a, {\L}ukasz Sterczewski, Adrian Chajec, Pawe{\l} Niewiadomski, Piotr Gruber, Marcin Wdowikowski, S{\l}awomir Czarnecki, Bart{\l}omiej Kryszak, Dominik Drabik, Tomasz Kajdanowicz, Kamil Mamak, Pawe{\l} Pre\'s, Katarzyna Paczkowska, Joachim Sobczuk, Tomasz Zi\k{e}ba, Jan Koco\'n, Maciej Piasecki, Przemys{\l}aw Kazienko 7/2/2026

Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches

Survey analyzing reasoning language model adoption across 28 scientific disciplines to identify adoption gaps outside hard sciences.

Ax Kaiqi Yang, Tai-Quan Peng, Sanguk Lee, Hui Liu 7/2/2026

Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation

Think-Before-Speak framework for LLM-based multi-agent simulation capturing internal evaluation processes and speaking intentions beyond observable dialogue.

Ax Dat Tien Nguyen, Thao Nguyen, Fadillah Adamsyah Maani, Huy M. Le, Muhammad Umer Sheikh, Numan Saeed, Muhammad Haris Khan, Salman Khan 7/2/2026

TerraBench: Can Agents Reason Over Heterogeneous Earth-System Data?

TerraBench benchmark evaluating whether agents can reason over heterogeneous Earth-system data including gridded, satellite, and geospatial inputs.

Ax Olly Styles, Sam Miller 7/2/2026

WorkBench Revisited: Workplace Agents Two Years On

WorkBench benchmark revisited showing dramatic agent capability improvements from 43% to 98% task completion and safety gains over two years.

Ax Sana Ayromlou, Purvi Sehgal, Pradyumna Narayana 7/2/2026

Diagnosing and Mitigating Compounding Failures in Agentic Persuasion via Taxonomic Strategy Retrieval

Analysis of compounding failures in multi-step agentic systems with taxonomic strategy retrieval solution for mitigating error drift in subjective tasks.

Ax Antonis Antoniades, Deepak Nathani, Ritam Saha, Alfonso Amayuelas, Ivan Bercovich, Zhaotian Weng, Vignesh Baskaran, Kunal Bhatia, William Yang Wang 7/2/2026

Heuresis: Search Strategies for Autonomous AI Research Agents Across Quality, Diversity and Novelty

Heuresis framework enabling autonomous AI research agents to explore performant, diverse, and novel ML ideas through composable research primitives.

Ax Runze Zhao, Dongruo Zhou, Sumit Kumar Jha, Nathaniel D. Bastian, Ankit Shah 7/2/2026

HiComm: Hierarchical Communication for Multi-agent Reinforcement Learning

HiComm framework for hierarchical communication in multi-agent reinforcement learning to leverage observation structure in cooperative environments.

Ax Yichen Guo, Kai Tang, Fenglai Lin, Yiding Sun, Dongxu Zhang, Wenya Wang, Lin William Cong, Shanghang Zhang 7/2/2026

FADE: Mitigating Hallucinations by Reducing Language-Prior Dominance in Large Vision-Language Models

Research on reducing hallucinations in vision-language models by investigating language prior dominance and proposing FADE mitigation method.

Ax Minwoo Yu, Young-guk Ha 7/2/2026

Relevance Is Not Permission: Warranted Attention for Value Contributions

Formalizes permission problem distinguishing relevant attention items from actual supporting evidence in retrieval-augmented generation and ranking tasks.

Ax Wenjia Jiang, Zongyuan Cai, Yuanhang Shao, Chenru Wang, Boyan Han, Zhixue Song, Keyu Chen, Shengwei An, Xu Yang, Zhou Yang 7/2/2026

ManimAgent: Self-Evolving Multimodal Agents for Visual Education

ManimAgent demonstrates self-evolving multimodal agents with multi-round reflection that persist learning across tasks, generating Manim animations from scientific papers.

Ax Irena Saracay, Ludwig Schmidt, Carlos Guestrin 7/2/2026

Beyond expert users: agents should help users construct preferences, not just elicit them

Framework for agents helping users construct preferences through domain knowledge learning rather than just eliciting predefined preferences from expert users.

Ax Yongbin Kim, Yashar Talebirad, Osmar R. Zaiane 7/2/2026

Why Solve It Twice? Hierarchical Accumulation of Skills for Transfer-Efficient ML Engineering

HASTE hierarchical multi-agent system for ML engineering that accumulates skills across competitions, reducing redundant compute through cross-competition knowledge transfer.

Ax Wanxia Cao, Chengzhen Duan, Pei Fu, Pengzhi Gao, Niu Lian, Fazhan Liu, Hui Liu, Heng Qu, Qinzhuo Wu, Zhehao Yu, Tongbo Chen, Shiqi Cui, Anan Du, Shukai Jia, Yuanfa Li, Wei Liu, Yike Liu, Wenchao Lu, Zhenbo Luo, Haoyuan Sun, Jiatong Sun, Cheng Tan, Yajie Wang, Changqiao Wu, Tao Xiong, Jiahui Yang, Yuxuan Yuan, Ruoceng Zhang, Shaojie Zhang, Jian Zhu, Jian Luan, Cong Zou 7/2/2026

Xiaomi-GUI-0 Technical Report

Xiaomi-GUI-0 technical report on GUI agents using vision-language models for end-to-end task completion in real applications through interface interactions.

Ax Mark Russinovich, Yanan Cai, Ahmed Salem 7/2/2026

Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique

Chain & Hash fingerprinting technique for LLM ownership verification and misuse detection with transparency, efficiency, persistence, robustness, and unforgeability.

Ax Sheila Schoepp, Mehran Taghian, Shotaro Miwa, Yoshihiro Mitsuka, Shadan Golestan, Osmar Za\"iane 7/2/2026

Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms

Compares PPO and SAC reinforcement learning algorithms for hardware fault tolerance in autonomous machines, enabling adaptation to changing conditions.

Ax Noah Y. Siegel, Nicolas Heess, Maria Perez-Ortiz, Oana-Maria Camburu 7/2/2026

Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations

Analyzes faithfulness of LLM self-explanations across 75 models from 13 families, examining tradeoffs between explanation conciseness and comprehensiveness.

Ax Tasnim Shahriar 7/2/2026

Comparative Analysis of Lightweight CNNs for Resource-Constrained Devices: Predictive Performance, Efficiency Trade-offs, and Initialization Effects

Reproducible benchmark comparing seven lightweight CNNs on CIFAR-10/100 and Tiny ImageNet under common training protocols with efficiency trade-off analysis.

Ax Davide D'Ascenzo, Sebastiano Cultrera di Montesano 7/2/2026