Isolater - Feed

Ax Md Atik Ahamed, Mihir Parmar, Palash Goyal, Yiwen Song, Long T. Le, Qiang Cheng, Chun-Liang Li, Hamid Palangi, Jinsung Yoon, Tomas Pfister 25d ago

TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

TFRBench: Benchmark for evaluating reasoning capabilities of time-series forecasting systems beyond numerical accuracy metrics.

Ax Akram Hossain, Rabab Abdelfattah, Xiaofeng Wang, Kareem Abdelfatah 25d ago

LLM-as-Judge for Semantic Judging of Powerline Segmentation in UAV Inspection

Using LLMs as judges to evaluate lightweight segmentation models for drone-based power line inspection under distribution shift.

Ax Jianzhi Yan, Zhiming Li, Le Liu, Zike Yuan, Shiwei Chen, Youcheng Pan, Buzhou Tang, Yang Xiang, Danny Dongning Sun 25d ago

Towards Effective In-context Cross-domain Knowledge Transfer via Domain-invariant-neurons-based Retrieval

Domain-invariant neurons approach for cross-domain knowledge transfer to boost LLM reasoning in expertise-scarce specialized domains.

Ax Le Liu, Zhiming Li, Jianzhi Yan, Zike Yuan, Shiwei Chen, Youcheng Pan, Buzhou Tang, Qingcai Chen, Yang Xiang, Danny Dongning Sun 25d ago

Reason Analogically via Cross-domain Prior Knowledge: An Empirical Study of Cross-domain Knowledge Transfer for In-Context Learning

Empirical study on using cross-domain demonstrations to improve in-context learning when expert annotations in target domain are scarce.

Ax Jian Tan, Fan Bu, Yuqing Gao, Dev Khanolkar, Jason Mackay, Boris Sobolev, Lei Jin, Li Zhang 25d ago

HYVE: Hybrid Views for LLM Context Engineering over Machine Data

HYVE framework for LLMs to better process machine data (logs, metrics, traces) through hybrid structured/unstructured representations.

Ax Myeongsoo Kim, Joe Hsu, Dingmin Wang, Shweta Garg, Varun Kumar, Murali Krishna Ramanathan 25d ago

CODESTRUCT: Code Agents over Structured Action Spaces

CODESTRUCT: LLM-based code agents using structured AST action spaces instead of text matching for reliable code editing and repository interaction.

Ax Hongkai Fan, Qinjing Xie, Bo Ouyang, Yaonan Wang, Zhi Yan, Jiawen He, Zheng Fang 25d ago

Multi-Agent Pathfinding with Non-Unit Integer Edge Costs via Enhanced Conflict-Based Search and Graph Discretization

Research on multi-agent pathfinding algorithms handling non-unit edge costs and continuous-time actions for real-world robotic/logistics scenarios.

Ax Siyuan Cheng, Bozhong Tian, YanChao Hao, Zheng Wei 25d ago

PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection

PRISM-MCTS learning approach using reasoning trajectories with metacognitive reflection, inspired by reasoning models like OpenAI o1, for efficient low-resource NLP methods.

Ax Akshat Dasula, Prasanna Desikan, Jaideep Srivastava 25d ago

Automated Auditing of Hospital Discharge Summaries for Care Transitions

Automated framework using locally-deployed LLMs to audit hospital discharge summaries at scale, enforcing transition-of-care documentation requirements for patient safety.

Ax Zeyu Wang, Cuiqianhe Du, Renyue Zhang, Kejian Tong, Qi He, Qiyuan Tian 25d ago

Adaptive Serverless Resource Management via Slot-Survival Prediction and Event-Driven Lifecycle Control

Adaptive serverless resource management framework using slot-survival prediction and event-driven architecture to optimize cold start latency and utilization.

Ax Dongying Lin, Yinan Liu, Shengwei tang, Bin Wang, Xiaochun Yang 25d ago

OntoTKGE: Ontology-Enhanced Temporal Knowledge Graph Extrapolation

OntoTKGE model for temporal knowledge graph extrapolation leveraging ontological knowledge to handle sparse historical interactions and enable behavioral pattern inheritance.

Ax Xiaotian Zhou, Di Tang, Xiaofeng Wang, Xiaozhong Liu 25d ago

Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning

GMRL-BD algorithm using bias-diffusion and multi-agent RL to detect untrustworthy topic boundaries of LLMs, identifying domains where model answers cannot be reliably trusted.

Ax Yi Nian, Aojie Yuan, Haiyue Zhang, Jiate Li, Yue Zhao 25d ago

Auditable Agents

Auditable Agents framework establishing accountability, auditability, and auditing definitions for LLM agents with external effects, addressing post-deployment answer-ability.

Ax Chengyi Yang, Pengzhen Li, Jiayin Qi, Aimin Zhou, Ji Wu, Ji Liu 25d ago

SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation

SCMAPR stage-wise multi-agent refinement framework for complex scenario text-to-video generation that refines and self-corrects ambiguous prompts through agent collaboration.

Ax Keuntae Kim, Mingyu Kang, Yong Suk Choi 25d ago

Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models

Thinking Diffusion method adding reasoning penalization and guidance to diffusion multimodal LLMs combining Chain-of-Thought reasoning with parallel generation capabilities.

Ax Haoyue Yang, Xuanle Zhao, Xuexin Liu, Feibang Jiang, Yao Zhu 25d ago

OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

OmniDiagram unified framework for code generation across diverse diagram types and languages using visual interrogation reward for alignment with visual specifications.

Ax Xiaolong Wei, Zerun Zhu, Simin Niu, Xingyu Zhang, Peiying Yu, Changxuan Xiao, Yuchen Li, Jicheng Yang, Zhejun Zhao, Chong Meng, Long Xia, Daiting Shi 25d ago

UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning

UniCreative approach using reference-free reinforcement learning to balance long-form coherence and short-form expressiveness in LLM-based creative writing generation.

Ax Yushuo Zheng (Affiliation 1, Affiliation 2), Huiyu Duan (Affiliation 1), Zicheng Zhang (Affiliation 1, Affiliation 2), Yucheng Zhu (Affiliation 1), Xiongkuo Min (Affiliation 1), Guangtao Zhai (Affiliation 1, Affiliation 2) 25d ago

Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

Market-Bench comprehensive benchmark evaluating LLM capabilities in economically-relevant tasks via configurable multi-agent supply chain model with LLM retailer agents.

Ax Chenjie Yang, Yutian Jiang, Anqi Liang, Wei Qi, Chenyu Wu, Junbo Zhang 25d ago

ActivityEditor: Learning to Synthesize Physically Valid Human Mobility

ActivityEditor dual-LLM-agent framework for zero-shot cross-regional human trajectory generation, synthesizing physically valid mobility patterns without region-specific historical data.

Ax Arnaud Liefooghe (LISIC), S\'ebastien Verel (LISIC) 25d ago

Inventory of the 12 007 Low-Dimensional Pseudo-Boolean Landscapes Invariant to Rank, Translation, and Rotation

Analysis of 12,007 rank-invariant pseudo-Boolean landscapes introducing stronger notion of rank landscape equivalence under translation and rotation symmetries.

Ax Chenghao Li, Jun Liu, Songbo Zhang, Huadong Jian, Hao Ni, Lik-Hang Lee, Sung-Ho Bae, Guoqing Wang, Yang Yang, Chaoning Zhang 25d ago

Experience Transfer for Multimodal LLM Agents in Minecraft Game

Echo memory framework for multimodal LLM agents enabling transfer of reusable knowledge across Minecraft tasks by decomposing experience into five interpretable dimensions.

Ax Da Lei, Feng Xiao, Lu Li, Yuzhan Liu 25d ago

SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills

SignalClaw framework using LLMs as evolutionary skill generators to synthesize interpretable traffic signal control strategies balancing effectiveness and explainability.

Ax Florent Capelli, YooJung Choi, Stefan Mengel, Mart\'in Mu\~noz, Guy Van den Broeck 25d ago

A canonical generalization of OBDD

Introduces Tree Decision Diagrams generalizing OBDD for Boolean function representation with improved succinctness and tractable operations like model counting and conditioning.

Ax Cedric Haufe, Frieder Stolzenburg 25d ago

From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement

Neurosymbolic approach combining LLMs with Logic Tensor Networks for auditable offer validation in regulated procurement, ensuring factually correct and legally verifiable decisions.

Ax Liyuan Deng, Shujian Deng, Yongkang Chen, Yongkang Dai, Zhihang Zhong, Linyang Li, Xiao Sun, Yilei Shi, Huaxi Huang 25d ago

COSMO-Agent: Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

COSMO-Agent tool-augmented RL framework teaching LLMs to bridge CAD-CAE gap by translating simulation feedback into valid geometric edits for iterative industrial design optimization.

Ax Zhe Zhao, Haibin Wen, Jiaming Ma, Jiachang Zhan, Tianyi Xu, Ye Wei, Qingfu Zhang 25d ago

ResearchEVO: An End-to-End Framework for Automated Scientific Discovery and Documentation

ResearchEVO framework for automated scientific discovery using LLMs to conduct undirected experimentation and generate explanations, instantiating discover-then-explain paradigm computationally.

Ax Xin Sun, Di Wu, Sijing Qin, Isao Echizen, Abdallah El Ali, Saku Sugawara 25d ago

Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

Research on LLM-as-a-Judge showing both humans and LLMs exhibit bias toward human-authored content labels over identical AI-generated content via counterfactual design and eye-tracking.

Ax Amir Konigsberg 25d ago

Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution

Philosophical critique of behavioral evaluation paradigms for AI systems and proposal for cognitive assessment methods.

Ax Zhiyong Ma, Zhitao Deng, Huan Tang, Jialin Chen, Zhijun Zheng, Zhengping Li, Qingyuan Chuai 25d ago

PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models

PECKER algorithm for efficient machine unlearning in diffusion models with directed gradient updates.

Ax Qing Guo, Xinhang Li, Junyu Chen, Zheng Guo, Shengzhe Xu, Lin Zhang, Lei Li 25d ago

CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control

CuraLight framework combining RL and LLMs for traffic signal control with debate-guided data curation.

Ax Ojas Jain, Dhruv Kumar 25d ago

LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo

LudoBench benchmark evaluating LLM strategic reasoning in Ludo board game with 480 handcrafted scenarios.

Ax Yitong Zhu, Yuxuan Jiang, Guanxuan Jiang, Bojing Hou, Peng Yuan Zhou, Ge Lin Kan, Yuyang Wang 25d ago

QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis

Quality-aware mixture of experts for multimodal sentiment analysis robust to noise and modality missingness.

Ax Jian Zhao, Haoren Luo, Yu Wang, Yuhan Cao, Pingyue Sheng, Tianxing He 25d ago

Can Large Language Models Reinvent Foundational Algorithms?

Unlearn-and-Reinvent pipeline testing whether LLMs can rediscover foundational algorithms after unlearning removal.

Ax Silja Ke{\ss}ler, Miriam Bautista-Salinero, Claudio Tennie, Charley M. Wu 25d ago

Emergent social transmission of model-based representations without inference

Study on cultural evolution showing minimal social learning can transmit higher-level representations without inference.

Ax Shuai Zhen, Yanhua Yu, Ruopei Guo, Nan Cheng, Yang Deng 25d ago

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

Hierarchical RL framework (STEP-HRL) for LLM agents using step-level transitions to reduce computational cost and history length.

Ax Hannah Sansford, Derek H. C. Law, Wei Liu, Abhishek Tripathi, Niresh Agarwal, Gerrit J. J. van den Burg 25d ago

Vision-Guided Iterative Refinement for Frontend Code Generation

Vision-language model critic for automated iterative refinement of frontend code generation with visual feedback loops.

Ax Xiangyue Zhang 25d ago

Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring

Open-source framework for autonomous LLM agents conducting deep learning experiments with hypothesis formation, training, and iterative refinement.

Ax Uljad Berdica, Fernando Acero, Anton Ipsen, Parisa Zehtabi, Michael Cashmore, Manuela Veloso 25d ago

When Do We Need LLMs? A Diagnostic for Language-Driven Bandits

Diagnostic framework determining when LLMs are necessary for contextual multi-armed bandits with text and numerical context.

Ax Gowthamkumar Nandakishore 25d ago

JTON: A Token-Efficient JSON Superset with Zen Grid Tabular Encoding for Large Language Models

JTON format, JSON superset with Zen Grid encoding for token-efficient structured data processing in LLMs.

Ax Yinan Liu, Dongying Lin, Sigang Luo, Xiaochun Yang, Bin Wang 25d ago

Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models

Joint knowledge base completion and QA using combined large and small language models for KB-related tasks.

Ax Bowen Zeng, Feiyang Ren, Jun Zhang, Xiaoling Gu, Ke Chen, Lidan Shou, Huan Li 25d ago

HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference

KV cache compression technique for multimodal LLM inference, reducing memory overhead and latency with hybrid compression strategy.

Ax TianZe Zhang, Sirui Sun, Yuhang Xie, Xin Zhang, Zhiqiang Wu, Guojie Song 25d ago

Context-Value-Action Architecture for Value-Driven Large Language Model Agents

Architecture for value-driven LLM agents addressing behavioral rigidity through context-value-action design.

Ax Maria Nesterova, Mikhail Kolosov, Anton Andreychuk, Egor Cherepanov, Oleg Bulichev, Alexey Kovalev, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik 25d ago

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

Foundation model enabling single GPT-based agent to perform across diverse multi-agent reinforcement learning tasks and environments.

Ax Yi Yuan, Xuhong Wang, Shanzhe Lei 25d ago

Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration

Research agent framework for generating trustworthy reports with confidence estimation and calibration mechanisms.

Ax Renxuan Tan, Rongpeng Li, Zhifeng Zhao, Honggang Zhang 25d ago

Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment

Multi-objective preference alignment for LLMs using Pareto-lenient consensus to handle diverse human values in model training.

Ax Eranga Bandara, Ross Gore, Sachin Shetty, Piumi Siyambalapitiya, Sachini Rajapakse, Isurunima Kularathna, Pramoda Karunarathna, Ravi Mukkamala, Peter Foytik, Safdar H. Bouk, Abdul Rahman, Xueping Liang, Amin Hass, Tharaka Hewa, Ng Wee Keong, Kasun De Zoysa, Aruna Withanage, Nilaan Loganathan 25d ago

Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains

AI agents for retail supply chain operations, automating demand forecasting, procurement, and inventory replenishment in supermarket chains.

Ax Michael Cuccarese 25d ago

Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis

Proposes epistemic blinding, an inference-time auditing protocol to separate memorized priors from data-driven inference in LLM-assisted agentic analysis systems.

Ax Elisabetta Rocchetti, Alfio Ferrara 25d ago

How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism

Investigates instruction-following mechanisms in LLMs through diagnostic probing, finding evidence for compositional skill deployment over universal mechanism.

Ax Wang Yang, Chaoda Song, Xinpeng Li, Debargha Ganguly, Chuang Ma, Shouren Wang, Zhihao Dou, Yuli Zhou, Vipin Chaudhary, Xiaotian Han 25d ago

ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments

Proposes ACE-Bench, agent evaluation benchmark with unified grid-based planning tasks, lightweight environments, and configurable difficulty/horizon control.

Ax Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, Chenxin An, Lei Li, Lingpeng Kong, Qi Liu, Zhifang Sui, Tong Yang 25d ago

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Introduces Claw-Eval, an end-to-end evaluation suite for autonomous agents addressing trajectory-opaque grading, safety, and interaction modality coverage.