Isolater - Feed

Ax Achint Mehta 29d ago

Reasoning effort, not tool access, buys first-try reliability in agentic code generation: an observational study

Observational study of 90 agentic code generation runs showing reasoning effort matters more than tool access for first-try reliability.

Ax Zhuowei Chen, Xiang Lorraine Li 29d ago

Neuron-Aware Data Selection for Annotation-Free LLM Self-Distillation

Neuron-aware data selection approach for annotation-free LLM self-distillation in specialized domains without human supervision.

Ax Donghyun Lee, Jitesh Chavan, Duy Nguyen, Sam Huang, Liming Jiang, Priyadarshini Panda, Timo Mertens, Saurabh Shukla 29d ago

OrbitQuant: Data-Agnostic Quantization for Image and Video Diffusion Transformers

Data-agnostic quantization method for diffusion transformers enabling efficient post-training quantization without calibration data recalibration.

Ax Junhao Shi, Siyin Wang, Xiaopeng Yu, Li Ji, Jingjing Gong, Xipeng Qiu 29d ago

Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Task-agnostic pretraining approach for Vision-Language-Action models separates physical competence learning from semantic alignment to reduce expert demonstration requirements.

Ax Jiale Amber Wang, Kaiyuan Wang, Pengyu Nie 29d ago

TestEvo-Bench: An Executable and Live Benchmark for Test and Code Co-Evolution

Executable benchmark for evaluating test generation agents on code-test co-evolution with real semantic verification of test-code relationships.

Ax Xuehui Wang, Xuankun Yang, Wei Shen 29d ago

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

Entropy-aware token pruning method for vision-language models to reduce redundant visual tokens while preserving critical information under dense instructions.

Ax Gil Harari, Yoel Zimmermann, Ola Tangen Kulseng, Laura Zichi, Chuin Wei Tan, Marc L. Descoteaux, Boris Kozinsky 29d ago

Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials

Systematic comparison of matrix-structured optimizers (Muon, SOAP) versus Adam for training machine learning interatomic potentials with improved efficiency.

Ax Yunhe Li, Hao Shi, Wenhao Liu, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Shuang Qiu, Linqi Song 29d ago

DemoPSD: Disagreement-Modulated Policy Self-Distillation

DemoPSD method improves LLM reasoning via self-distillation with disagreement modulation to reduce overfitting and improve cross-domain generalization.

Ax Yuxuan Li, Lingxi Xie, Xinyue Huo, Jihao Qiu, Jiacheng Shao, Pengfei Chen, Jiannan Ge, Kaiwen Duan, Qi Tian 29d ago

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

Benchmark dataset (DramaSR-532K) and reasoning LLM approach for speaker recognition in TV dramas using dialogue attribution tasks.

Ax Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng 29d ago

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Program-as-Weights paradigm compiles natural-language specifications into compact, locally-executable neural artifacts for fuzzy functions.

Ax Matteo Boglioni, Thibault Rousset, Siva Reddy, Marius Mosbach, Verna Dankers 29d ago

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

LACUNA testbed evaluates localization precision in LLM unlearning methods that remove sensitive training data and PII.

Ax R\'ois\'in Luo, James McDermott, Colm O'Riordan 29d ago

Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition

Model-agnostic global interpretability method for understanding perturbation robustness in image models via spectral analysis.

Ax Hana Chockler, David A. Kelly, Daniel Kroening, Youcheng Sun 29d ago

Causal Explanations for Image Classifiers

Black-box method for computing image classifier explanations using formal causal theory and actual causality definitions.

Ax Yuhan Li, Wei Zhang, Juan Chen, Jiangjia Yan, Peng Xiangli, Liangze Yin 29d ago

ADMC: Attention-based Diffusion Model for Missing Modalities Feature Completion

Attention-based diffusion model for completing missing modalities in multimodal emotion and intent recognition tasks.

Ax Saurabh Ranjan, Brian Odegaard 29d ago

Psychological Imagination Networks Show Cross-Population Centrality and Clustering Alignment in Humans That Large Language Models Fail to Replicate

Psychological network analysis comparing mental imagery structure between humans and LLMs across populations and languages.

Ax Hanyu Wang, Ruohan Xie, Yutong Wang, Guoxiong Gao, Xintao Yu, Bin Dong 29d ago

Aria: An Agent For Retrieval and Iterative Auto-Formalization via Dependency Graph

Aria agent system for theorem formalization in Lean using retrieval and iterative auto-formalization to improve LLM accuracy in mathematics.

Ax Raj Ghugare, Roger Creus Castanyer, Catherine Ji, Kathryn Wantlin, Jin Schofield, Karthik Narasimhan, Benjamin Eysenbach 29d ago

BuilderBench: The Building Blocks of Intelligent Agents

BuilderBench benchmark for developing AI agents that learn through interaction and exploration rather than mimicry alone.

Ax Yankai Jiang, Yujie Zhang, Peng Zhang, Wenjie Li, Yichen Li, Jintai Chen, Xiaoming Shi, Shihui Zhen 29d ago

Ophiuchus: Incentivizing Tool-augmented "Think with Images" for Joint Medical Segmentation, Understanding and Reasoning

Ophiuchus tool-augmented framework enabling medical MLLMs to dynamically focus on fine-grained visual regions for clinical reasoning tasks.

Ax Masum Hasan, Junjie Zhao, Ehsan Hoque 29d ago

HAL: Inducing Human-likeness in LLMs with Alignment

HAL framework for aligning language models to human-likeness through interpretable, data-driven alignment methods.

Ax Peixin Huang, Yaoxin Wu, Yining Ma, Cathy Wu, Wei Zhang, Wen Song 29d ago

A General Neural Backbone for Mixed-Integer Linear Optimization via Dual Attention

Attention-driven neural backbone for solving mixed-integer linear programming using graph neural networks with improved representation power.

Ax Menglin Xia, Xuchao Zhang, Shantanu Dixit, Paramaguru Harimurugan, Rujia Wang, Victor Ruhle, Robert Sim, Chetan Bansal, Saravan Rajmohan 29d ago

Memora: A Harmonic Memory Representation Balancing Abstraction and Specificity

Memora harmonic memory representation system for agent memory balancing abstraction and specificity for efficient context-aware retrieval.

Ax Fengyuan Liu, Jay Gala, Nilaksh, Dzmitry Bahdanau, Siva Reddy, Hugo Larochelle 29d ago

BRIDGE: Predicting Human Task Completion Time From Model Performance

BRIDGE psychometric framework predicting human task completion time from model performance without direct human annotations.

Ax Anirudh Ajith, Amanpreet Singh, Jay DeYoung, Nadav Kunievsky, Austin C. Kozlowski, Oyvind Tafjord, James Evans, Daniel S. Weld, Tom Hope, Doug Downey 29d ago

PreScience: A Dataset and Benchmark for Scientific Forecasting

PreScience dataset and benchmark for forecasting scientific advances using 98K AI papers with citations and author histories.

Ax Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Haoxuan Li, Hao Wang, Shijian Wang, Guanting Dong, Jiajie Jin, Yinuo Wang, Yuan Lu, Ji-Rong Wen, Zhicheng Dou, Zhouchen Lin 29d ago

OmniGAIA: Towards Native Omni-Modal AI Agents

OmniGAIA benchmark evaluating omni-modal AI agents with vision, audio, and language integration for complex reasoning and tool usage.

Ax Giona Fieni, Joschua W\"uthrich, Marc-Philippe Neumann, Christopher H. Onder 29d ago

Learning-based Multi-agent Race Strategies in Formula 1

Reinforcement learning approach for multi-agent Formula 1 race strategy optimization, modeling energy, tire degradation, and competitor behavior.

Ax Drew Prinster, Clara Fannjiang, Ji Won Park, Kyunghyun Cho, Anqi Liu, Suchi Saria, Samuel Stanton 29d ago

Conformal Policy Control

Conformal policy control method using safe reference policies to regulate untested agent policies, balancing exploration and safety constraints.

Ax Boyuan Guan, Wencong Cui, Levente Juhasz 29d ago

A Dual-Helix Governance Approach Towards Reliable Agentic Artificial Intelligence for WebGIS Development

Dual-helix governance framework stabilizing agentic AI for WebGIS by using knowledge graphs and protocol enforcement to address context and instruction failures.

Ax Andreas Schlapbach 29d ago

Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

Formal verification framework for LLM agent protocols, comparing Schema-Guided Dialogue and Model Context Protocol for agent-tool integration.

Ax Pablo de los Riscos, Fernando J. Corbacho, Michael A. Arbib 29d ago

Working Paper: Towards a Category-theoretic Comparative Framework for Artificial General Intelligence

Category-theoretic framework for defining and comparing AGI systems, addressing lack of formal AGI definitions and benchmarking approaches.

Ax Niklas Herbster, Martin Zborowski, Alberto Tosato, Gauthier Gidel, Tommaso Tosato 29d ago

Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence

Activation steering methods to prevent LLM misalignment at runtime by manipulating linear structures in activation space.

Ax Trilok Padhi, Ramneet Kaur, Krishiv Agarwal, Adam D. Cobb, Daniel Elenius, Manoj Acharya, Colin Samplawski, Alexander M. Berenbeim, Nathaniel D. Bastian, Susmit Jha, Ugur Kursuncu, Anirban Roy 29d ago

From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents

Framework for interpreting temporal evolution of concepts in LLM agents using conformal inference, improving transparency of sequential behavior.

Ax Sen Cui, Jingheng Ma 29d ago

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

Hamiltonian-based approach to generative world modeling combining video synthesis, 3D scene reconstruction, and latent predictive models.

Ax Zenghui Zhou, Man Li, Xiaoke Fang, Xinyi Zhou, Weibin Lin, Zheng Zheng 29d ago

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs

LGMT framework uses first-order logic for oracle-free evaluation of LLM reasoning robustness under logically equivalent transformations.

Ax Pengyu Zhu, Lijun Li, Yaxing Lyu, Qianxin Luo, Jingyi Yang, Yi Liu, Tingfeng Hui, Xinyu Yuan, Li Sun, Sen Su, Jing Shao 29d ago

A Unified Framework for the Evaluation of LLM Agentic Capabilities

Unified evaluation framework for LLM agentic capabilities that separates model capability from benchmark implementation choices for fair cross-benchmark comparison.

Ax Tong Bai, Zhenglin Wan, Pengfei Zhou, Xingrui Yu, Yang You, Ivor W. Tsang 29d ago

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

SkillDAG framework models inter-skill relationships as typed directed graphs for LLM agent skill selection at scale, improving over similarity-matching approaches.

Ax Wojciech Zarzecki, Jan Dubi\'nski, Sebastian Cygert 29d ago

The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection

Analysis of benchmark contamination detection methods for LLMs, showing limitations of statistical tools in realistic auditing scenarios with distribution shift.

Ax Muhammad Zia Hydari, Raja Iqbal 29d ago

The Token Not Taken: Sampling, State, and the Stochasticity of AI Agents

Study of stochasticity sources in AI agents, examining how foundation models and orchestration loops produce variability in planning, tool calls, and outputs.

Ax Xinbao Qiao, Xianglong Du, Wei Liu, Jingqi Zhang, Peihua Mai, Meng Zhang, Yan Pang 29d ago

When Sample Selection Bias Precipitates Model Collapse

Research on model collapse from recursive training on synthetic data and how sample selection bias affects model verification in low-resource regimes.

Ax Tingyang Chen, Shuo Lu, Kang Zhao, Weicheng Meng, Hanlin Teng, Tianhao Li, Chao Li, Xule Liu, Jian Liang, Zhizhong Zhang, Yuan Xie, Heng Qu, Kun Shao, Jian Luan 29d ago

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

HarnessX: foundry for composable, adaptive agent harnesses combining prompts, tools, memory, and control flow with systematic evolution from execution traces.

Ax Sergei Trashchenkov 29d ago

Power Systems Agent Benchmark: Executable Evaluation of AI Agents in Electric Power Engineering

Power Systems Agent Benchmark: executable evaluation framework for tool-using AI agents applied to power engineering tasks with concrete outcome verification.

Ax Jeffrey Flynt 29d ago

GroundEval: A Deterministic Replacement for LLM-as-Judge in Stateful Agent Evaluation

GroundEval: deterministic alternative to LLM judges for agent evaluation, verifying agent search, retrieval, and citation behavior through execution traces.

Ax Xinyuan Song, Zekun Cai 29d ago

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

Grounded Iterative Language Planning: parameterized world models for LLM agents reducing hallucination propagation through measurable transition prediction.

Ax Tianlong Wang, Yuhang Wang, Weibin Liao, Xin Gao, Xinyu Ma, Yang Lin, Yasha Wang, Liantao Ma 29d ago

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

Dynamic representation editing framework steering LLM reasoning trajectories toward truth by analyzing geometry of correctness in reasoning chains.

Ax Tianyu Jin, Shuo Chen, Yida Wang, Liuyu Xiang, Yingzhuo Liu, Zhiyao Jiang, Yexin Li, Zhaofeng He 29d ago

SAGA: Scene-Aware, Goal-Evolving Agents for Long-Horizon CivRealm Strategy Planning

SAGA: scene-aware multi-agent system for long-horizon strategy planning in CivRealm addressing scene blindness, context overflow, and cross-game learning.

Ax Anuj Kaul, Qianlong Lan, Pranay Gupta 29d ago

Behavioral Governance for Autonomous AI Agents: The AgentBound Framework

AgentBound: behavioral governance framework for autonomous AI agents controlling consequential actions (transactions, communications) based on operational context.

Ax Arshia Soltani Moakhar, Iman Gholami, Max Springer, Mahdi JafariRaviz, MohammadTaghi Hajiaghayi 29d ago

Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics

Framework for autoformalization: automatic translation of natural language mathematics to Lean 4 verifiable code using LLM agents beyond standard libraries.

Ax Kaiwen Xiong, Haonian Ji, Shi Qiu, Zeyu Zheng, Cihang Xie, Xinyu Ye, Huaxiu Yao 29d ago

ClawArena-Team: Benchmarking Subagent Orchestration and Dynamic Workflows in Language-Model Agents

ClawArena-Team: benchmark for evaluating LLM agents managing subagents through dynamic workflows with parallel asynchronous orchestration.

Ax Shreya Rajpal, Tanawan Premsri, Parisa Kordjamshidi 29d ago

Spatial Reasoning via Modality Switching Between Language and Symbolic Representation

Framework for spatial reasoning via switching between language and symbolic representations (layouts, grids) to improve multi-hop reasoning in LLMs.

Ax Yankai Jiang, Weiting Tang, Haoran Sun, Zhenyu Tang, Yuejie Hou, Yingnan Han, Rubo Wang, Yueyuxiao Yang, Cheng Liang, Lilong Wang, Wenjie Lou, Xiaosong Wang, Lei Bai, Meng Yang 29d ago

A Self-Evolving Agentic System for Automated Generation and Execution of Biological Protocols

ProtoPilot: self-evolving multi-agent system for automated generation and execution of biological lab protocols with alignment between design and physical execution.

Ax Tong Xiao, Jingbo Zhu 29d ago

Introduction to Transformers: an NLP Perspective

Introduction to Transformer architecture covering basic concepts, model refinements, and NLP applications.