Isolater - Feed

Ax Maureese Williams, Dymitr Nowicki 16d ago

GATS: Graph-Augmented Tree Search with Layered World Models for Efficient Agent Planning

GATS: Planning framework combining tree search with layered world models to reduce LLM inference calls during agent planning, improving efficiency and reducing stochasticity.

Ax Zongxia Li, Zhongzhi Li, Yucheng Shi, Ruhan Wang, Junyao Yang, Zhichao Liu, Xiyang Wu, Anhao Li, Yue Yu, Ninghao Liu, Lichao Sun, Haotao Mi, LeoweiLiang 16d ago

Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading

Long-Horizon-Terminal-Bench: Benchmark with 46 long-horizon terminal tasks and dense reward grading to evaluate agent capabilities beyond simple well-specified problems.

Ax Joseph K. Miller 16d ago

A Formalization of the Mean-Field Derivation of the Vlasov Equation: AI-Assisted Lean Formalization as a Strategy Game

AI-assisted formalization of mathematical proofs in Lean 4 proof assistant, framed as a game where AI directs formalization of research results.

Ax Kunbo Zhang, Lei Fu, Zeyu Wang, Zijing Liu, Kejian Tong 16d ago

ARCANA: A Reflective Multi-Agent Program Synthesis Framework for ARC-AGI-2 Reasoning

ARCANA: Multi-agent framework for ARC-AGI-2 task solving using perception agents, program synthesis, symbolic execution, and reflective refinement under computational constraints.

Ax Saroj Gopali, Bipin Chhetri, Deepika Giri, Sima Siami-Namini, Akbar Siami Namin 16d ago

Neuro-Agentic Control: A Deep Learning-based LLM-Powered Agentic AI Framework for Controlling Security Controls

Neuro-agentic control framework combines LLMs with neural networks for industrial IoT security monitoring while mitigating hallucination risks in closed-loop control.

Ax Tan-Minh Nguyen, Hoang-Trung Nguyen, Huu-Dong Nguyen, Dinh-Truong Do, Thi-Hai-Yen Vuong, Le-Minh Nguyen 16d ago

L-MAD: A Systematic Evaluation of Multi-Agent Debate Structures in Legal Reasoning

L-MAD framework systematically evaluates multi-agent debate structures for legal reasoning tasks, testing different agent personas and aggregation methods.

Ax Runhan Shi, Quan Zhou, Yuqian Xu, Shuai Yang, Xin Wu, Zitong Zhou, Hui Liu, Bin Cha, Zheming Wang, Liya Li, Wei Wei, Haoyuan Hu, Jun Xu 16d ago

MedRealMM: A Real-World Multimodal Benchmark for Chinese Online Medical Consultation

MedRealMM is a large-scale benchmark for evaluating LLMs in Chinese online medical consultation using real multimodal data and clinical quality metrics.

Ax Peng Kuang, Haibo Jin, Xiaoyu Han, Yanli Wang, Xiaopeng Yuan, Ye Yu, Kaidi Xu, Haohan Wang 16d ago

KV-PRM: Efficient Process Reward Modeling via KV-Cache Transfer for Multi-Agent Test-Time Scaling

Efficient process reward modeling via KV-cache transfer for scaling multi-agent systems, reducing quadratic complexity in long trajectory scoring.

Ax Dan C. Hsu, Luke Lu 16d ago

Scoped Verification for Reliable Long-Horizon Agentic Context Evolution under Distribution Shift

Verification mechanism for agentic context evolution in long-horizon LLM deployments with persistent system instructions under distribution shift.

Ax Izumi Takahara, Teruyasu Mizoguchi 16d ago

Toward Auditable AI Scientists: A Hypothesis Evolution Protocol for LLM Agents

Protocol for making LLM agents auditable in scientific discovery by explicitly tracking hypothesis evolution, tests, and belief updates.

Ax Mat\v{e}j Kripner, Milan Straka 16d ago

OpenProver: Agentic and Interactive Theorem Proving with Lean 4

Open-source system for LLM-driven automated theorem proving with Lean 4 verification using Planner-Worker-Verifier architecture.

Ax Yanzhen Chen, Zihan Xu, Xiaocheng Zhang, Zhiting Fan, Weiqi Zhai, Hongxia Xu, Zuozhu Liu 16d ago

LongMedBench: Benchmarking Medical Agents for Long-Horizon Clinical Decision-Making

Benchmark for evaluating LLM-based medical agents on long-horizon clinical decision-making using real EHR data across repeated visits and evolving treatments.

Ax Nuocheng Yang, Sihua Wang, Zihan Chen, Tony Q. S. Quek, Changchuan Yin 16d ago

Communication-Efficient Digital-Twin Coordination for Heterogeneous LLM Embodied Agents over Computing Power Networks

Coordination framework for heterogeneous LLM embodied agent teams with communication-efficient mechanisms for physical AI deployments.

Ax Jingbo Chen, He Wang, Wei Yuan, Yuqiao Lai, Zhenyan Lu 16d ago

Fictional Worldbuilding: Multi-Agent LLM Collaboration with Hierarchical Context Compression and Iterative Review

Multi-agent LLM collaboration system for fictional worldbuilding using hierarchical context compression and iterative review for consistency.

Ax Johannes Schmitt, Tim Gehrunger, Jasper Dekoninck, Gergely B\'erczi, Uri Kreitner, Liam Price, David Holmes 16d ago

ProofCouncil: An LLM Agent for Solving Open Mathematical Problems

LLM agent with author-critic architecture for solving open mathematical problems through agentic workflow inspired by mathematical practice.

Ax Jiayu Yao, Yiwei Wang, Anmeng Zhang, Zhe Sun, Songsong Wang, Lingrui Mei, Yuyao Ge, Shenghua Liu 16d ago

Multimodal Reward Hacking in Reinforcement Learning

Study of reward hacking in multimodal LLM reinforcement learning across VQA and safety tasks with varying reward designs and model scales.

Ax Sanjana Pedada, Aditya Dhavala, Neelraj Patil 16d ago

Shared Selective Persistent Memory for Agentic LLM Systems

Memory mechanism for multi-turn agentic LLM systems that selectively persists configuration, domain constraints, and tool-use patterns across sessions.

Ax Chongyu Qu, Can Cui, Zhengyi Lu, Junchao Zhu, Tianyuan Yao, Junlin Guo, Juming Xiong, Yanfan Zhu, Yuechen Yang, Bennett A. Landman, Yuankai Huo 16d ago

SAGEAgent: A Self-Evolving Agent for Cost-Aware Modality Acquisition in Multimodal Survival Prediction

Self-evolving agent for multimodal survival prediction that actively reasons about which diagnostic modalities to acquire given cost constraints.

Ax Yuan Cao, Haiqian Yang 16d ago

Beyond Fixed Representations: The Vocabulary and Verifier Gaps in Open-Ended AI

Examines structural limitations in AI systems for reasoning and code generation when representational frames are fixed rather than open-ended.

Ax Jan Gronewald, Andreas Emrich, Nijat Mehdiyev 16d ago

Knowledge Graphs and Explainable AI as Complementary Resources for Urban Mining

Combines knowledge graphs and explainable AI techniques for pre-demolition assessment in urban mining decision support.

Ax Hannah M. Liu, Rhea Saxena, Shiv Asthana 16d ago

TrustX Agent Risk Classification Framework (ARC): Risk-Tiering Internally Created Agentic AI Systems

Risk classification framework for governing agentic AI systems across enterprise and public sector applications with seven system types.

Ax Kaiji Zhou, Ales Leonardis, Yue Feng 16d ago

Agora: Enhancing LLM Agent Reasoning Via Auction-Based Task Allocation

Framework for LLM agents to orchestrate expert models and tools via auction-based task allocation considering performance and cost efficiency.

Ax Nicolas Koller, Andreas u. Schmidt 16d ago

REFORGE: A Method for Benchmarking LLMs' Reverse Engineering Capabilities in Decompiled Binary Function Naming

Benchmark for evaluating LLM reverse engineering capabilities on decompiled binary function naming tasks, addressing measurement gaps in security applications.

Ax Qingzhuo Wang, Ruiyang Qin, Zhenxin Qin, Wen Shen, Zhihua Wei 16d ago

A Unified Approach to Interpreting Knowledge Distillation for Large Language Models via Interactions

Unified framework analyzing knowledge distillation mechanisms in LLMs through interaction decomposition to understand why KD methods work.

Ax Farica Zhuang, Seong Woo Han, Zixuan Wen, Shu Yang, Yize Zhao, Li Shen 16d ago

iLENS: Interpretable LLM-Guided Mixture-of-Experts for Neuroimaging Survival Analysis

Interpretable LLM-guided mixture-of-experts model for Alzheimer's disease survival prediction combining neuroimaging with natural language reasoning.

Ax Ian Colbert, Eashan Dash, Pablo Monteagudo-Lago, Juan Amboage, Srinidhi N, Giuseppe Franco, Nicholas J. Fraser, Arun Ramachandran 16d ago

Signed Symmetric Quantization for Few-Bit Integers

Quantization technique for few-bit integer representation addressing asymmetric clipping issues in signed symmetric quantizers.

Ax Ali Kayyam 16d ago

Sticky Routing: Training MoE Models for Memory-Efficient Inference

Training method for Mixture-of-Experts models using differentiable routing consistency loss to reduce weight swapping during edge device inference.

Ax Kehan Guo, Yili Shen, Yujun Zhou, Yue Huang, Chujie Gao, Shiyi Du, Xiangliang Zhang 16d ago

Reward Transport: Property Control in Flow Matching via Noise-Space Alignment

Technique using optimal transport coupling in flow matching to embed controllable structure for molecular property generation.

Ax Qianli Liu, Kaibin Guo, Zicong Hong, Peng Li, Fahao Chen, Haodong Wang, Jian Lin, Song Guo 16d ago

Director: Accelerating Distributed MoE Serving via Online Proactive Expert Placement

Distributed serving system for Mixture-of-Experts models using online proactive expert placement to optimize communication and computation latencies.

Ax Thinh T. H. Nguyen, Le-Tuan Nguyen, Minh-Duong Nguyen, Nhi Trinh, Anh Tran Nam Nguyet, Dung D. Le, Kok-Seng Wong 16d ago

HERO: A Heterogeneity-Aware Benchmark Library for Federated Continual Learning

Standardized benchmark library for federated continual learning with consistent evaluation methodology across heterogeneous client settings.

Ax Tao Lu, Haoyu Wang, Zonghui Wang, Keshen Xiang, Jiaheng Zhang, Wenzhi Chen 16d ago

Accelerating GPU Inference of Large Language Models with Moderately Unstructured Sparse Weight Matrices

GPU kernel optimization for sparse matrix multiplication enabling efficient LLM inference acceleration with moderately unstructured pruned weight matrices.

Ax Georgios Laskaris, Reuben Brasher, Niki van Stein, Elena Raponi, Thomas B\"ack, Florian Neukart 16d ago

LLM-Driven Evolutionary Generation of Multi-Objective Bayesian Optimization Algorithms

Uses LLMs as mutation and crossover operators to evolve multi-objective Bayesian optimization algorithms automatically with hyperparameter tuning.

Ax Joshua Pickard, Wei Qi, Na Li, Ann Woolley, Lisa Cosimi, Roy Kishony, Deborah Hung 16d ago

EHR-MPC: Inference-Time Control for Sepsis Treatment with Generative Patient Digital Twins

Framework decoupling patient dynamics learning from treatment optimization for sepsis using generative EHR model digital twins and inference-time control.

Ax Padam Jung Thapa, Abdullah Bin Naeem, Ayon Dey, Anav Katwal, Md Tamjidul Hoque 16d ago

Multi-Conditioned Diffusion Synthesis of Sand Boils for Low-Resource Earthen-Levee Inspection

Diffusion-based synthetic image generation pipeline for sand boil defect detection on earthen levees using ControlNet and DreamBooth fine-tuning.

Ax Hyunjin Seo, Hyeon Hwang, Gyubok Lee, Jay Shin, Jimin Park, Taesoo Kim, Sanghoon Lee, Hongjoon Ahn, Sungjun Han, Sangwon Jung 16d ago

TheBioCollection: Unified Pre-Training Scale LLM Corpus for Biology

Unified corpus for biology domain combining heterogeneous biological databases and resources for pretraining specialized large language models.

Ax Sunshine Jiang, John Marangola, David Zhang, Raghuram Kowdeed, Ruiyang Luo, Nitish Dashora, Richard Li, Pulkit Agrawal, Zhang-Wei Hong 16d ago

Prompt-Driven Exploration

Method using LLMs and VLAs as exploration guidance in reinforcement learning to escape weak policies through natural language prompts.

Ax Ning Liu, Kalle Kujanp\"a\"a, Zhaoxuan Zhu, P Aditya Sreekar, Kaiwen Liu, Chuanneng Sun, Jorge Marchena Menendez, Matthew Bales, Tianyu Yang, Shahnawaz Alam, Rose Yu, Baoyuan Liu, Kristina Klinkner, Shervin Malmasi 16d ago

Eluna: An Agentic LLM System for Automating Warehouse Operations with Reasoning and Task Execution

Production-deployed agentic LLM system using graph-guided multi-agent framework for reliable Standard Operating Procedure execution in warehouse operations.

Ax Berkay Anahtarci 16d ago

NL-PAC: Specification Ambiguity and Certified Minimax Risk Floors in LLM-Mediated Supervision

Framework analyzing specification ambiguity in natural language supervision for LLMs, introducing NL-PAC to measure minimax risk bounds in label generation.

Ax Hantao Zhang, Jinru Sui, Ed Li, Dirk Bergemann, Zhuoran Yang 16d ago

MultiView-Bench: A Diagnostic Benchmark for World-Centric Multi-View Integration in VLMs

Benchmark dataset for evaluating vision-language models' ability to integrate multi-view observations into coherent 3D scene understanding.

Ax Yuri Ishitoya, Jeremy Siburian, Masashi Hamaya, Kuniaki Saito, Cristian C. Beltran-Hernandez, Mai Nishimura 16d ago

CLAP: Direct VLM-to-VLA Adaptation via Language-Action Grounding

Research on adapting pretrained vision-language models to vision-language-action models for robotics with minimal architectural changes to preserve VLM contributions.

Ax Viraaji Mothukuri, Reza M. Parizi 16d ago

The Patchwork Problem in LLM-Generated Code

Identifies structural incoherence in LLM-generated code where locally valid patches fail globally due to missing configs, imports, or authentication guards.

Ax Sijia Gu, Noor Nashid, Ali Mesbah 16d ago

SCATE: Learning to Supervise Coding Agents for Cost-Effective Test Generation

SCATE framework teaches coding agents to generate better tests by addressing lazy generation problem that causes premature task termination and low code coverage.

Ax Brent Kong, Tejas Ram, Tony Yue Yu 16d ago

AlphaZero in Sparsely Rewarded Games: Limits and Auxiliary Supervision

Analyzes AlphaZero performance gap in sparsely rewarded games like Connect Four and Chomp, studying auxiliary supervision effectiveness.

Ax Shrimon Mukherjee, Kishalay Das, Partha Basuchowdhuri, Pawan Goyal, Niloy Ganguly 16d ago

Model Agnostic Graph Prompt Learning for Crystal Property Prediction

Model-agnostic graph prompt learning reduces parameters and domain expertise requirements for crystal property prediction with Graph Neural Networks.

Ax Ajay Narayanan Sridhar, Ronak Singh, Mehrdad Mahdavi, Vijaykrishnan Narayanan 16d ago

Correlation-Aware Contextual Bandits with Surrogate Rewards for LLM Routing

Studies contextual bandits with correlated arms and surrogate rewards for LLM routing, handling noisy auxiliary reward information.

Ax Chao Wang, Lingling Li, Fang Liu, Licheng Jiao 16d ago

Evolutionary Intelligence for Scientific Discovery: From Evolutionary Computation to Cumulative Discovery Systems

Reviews evolutionary computation as basis for autonomous scientific discovery systems that integrate experimental feedback and human guidance in open-ended exploration.

Ax Jialun Cao, Xinru Yan, Songqiang Chen, Yaojie Lu, Zhongxin Liu, Shing-Chi Cheung 16d ago

Inside the Skill Market: From Software Engineering Activities to Reusable Agent Skills

Examines emerging AI agent skill repositories and marketplaces, analyzing what software engineering activities become reusable agent skills.

Ax Yang Chen, Yunwen Li, Yufan Shen, Minghao Liu, Tianyu Zheng, Bin Fu, Qunshu Lin, Zhi Yu, Botian Shi 16d ago

OmniMapBench: Benchmarking Visual-Centric Reasoning on Diverse Map Documents

OmniMapBench introduces benchmark with 2,096 manually annotated examples for evaluating visual-centric reasoning in LVLMs on map documents.

Ax Camila Piscioneri Magalh\~aes, Lucas Pascotti Valem 16d ago

Integrating Large Language Models and Graph Convolutional Networks for Semi-Supervised Image Classification

Integrates LLMs with Graph Convolutional Networks to improve semi-supervised image classification by addressing graph construction challenges.

Ax Bartosz Zi\'o{\l}ko, Kacper Dobrzeniewski 16d ago

Augmenting Fundamental Analysis with Large Language Models: A RAG-Based System for Generating Investor Briefs

RAG-based system augments fundamental company analysis by combining LLMs with SEC filings and macroeconomic data via API calls to GPT-4o.