Isolater - Feed

Ax Wang Yang, Chaoda Song, Xinpeng Li, Debargha Ganguly, Chuang Ma, Shouren Wang, Zhihao Dou, Yuli Zhou, Vipin Chaudhary, Xiaotian Han 25d ago

ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments

Proposes ACE-Bench, agent evaluation benchmark with unified grid-based planning tasks, lightweight environments, and configurable difficulty/horizon control.

Ax Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, Chenxin An, Lei Li, Lingpeng Kong, Qi Liu, Zhifang Sui, Tong Yang 25d ago

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Introduces Claw-Eval, an end-to-end evaluation suite for autonomous agents addressing trajectory-opaque grading, safety, and interaction modality coverage.

Ax Song-Ju Kim 25d ago

Contextuality as an External Bookkeeping Cost under Fixed Shared-State Semantics

Theoretical analysis of contextuality in quantum information systems as external bookkeeping cost under classical simulation.

Ax Uday Allu, Sonu Kedia, Tanmay Odapally, Biddwan Ahmed 25d ago

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

Proposes Web Retrieval-Aware Chunking (W-RAC) for efficient RAG document chunking to balance retrieval quality, latency, and cost on web-scale content.

Ax Jiaquan Zhang, Qigan Sun, Chaoning Zhang, Xudong Wang, Zhenzhen Huang, Yitian Zhou, Pengcheng Zheng, Chi-lok Andy Tai, Sung-Ho Bae, Zeyu Ma, Caiyan Qin, Jinyu Guo, Yang Yang, Hengtao Shen 25d ago

TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models

Proposes Task-Driven Alignment (TDA-RC) for improving reasoning chains in LLMs by bridging logical gaps between CoT and multi-round thought paradigms.

Ax Julian Coda-Forno, Jane X. Wang, Arslan Chaudhry 25d ago

The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse

Evaluates bidirectional training objectives (MLM, masked attention) to mitigate the reversal curse in autoregressive language models.

Ax Mohammad Reza Ghasemi Madani, Soyeon Caren Han, Shuo Yang, Jey Han Lau 25d ago

Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

Introduces Inclusion-of-Thoughts (IoT), a strategy to reduce LLM instability on multiple-choice questions by filtering irrelevant distractors.

Ax Nitish Kumar, Sannu Kumar, S Akash, Manish Gupta, Ankith Karat, Sriparna Saha 25d ago

SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs

Proposes SUMMIR framework for ranking sports insights extracted by LLMs, addressing hallucinations with 7,900-article dataset across four sports.

Ax Jos\'e Guilherme Marques dos Santos, Ricardo Yang, Rui Humberto Pereira, Alexandre Sousa, Br\'igida M\'onica Faria, Henrique Lopes Cardoso, Jos\'e Duarte, Jos\'e Lu\'is Reis, Lu\'is Paulo Reis, Pedro Pimenta, Jos\'e Paulo Marques dos Santos 25d ago

From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering

Evaluates four open-source PDF-to-Markdown conversion frameworks (Docling, MinerU, Marker, DeepSeek OCR) for RAG document preprocessing impact on QA accuracy.

Ax Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen 25d ago

Learning to Retrieve from Agent Trajectories

Studies how to design information retrieval systems for LLM agents versus humans, proposing learning-to-rank methods for agent trajectories.

Ax Muhammad Tahir Ashraf 25d ago

Synthetic Trust Attacks: Modeling How Generative AI Manipulates Human Decisions in Social Engineering Fraud

Analysis of how generative AI enables social engineering fraud and trust manipulation attacks in financial crime scenarios.

Ax Abhishek Dharmaratnakar, Srivaths Ranganathan, Debanshu Das, Anushree Sinha 25d ago

Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity

Surveys transition from heuristic-based to generative synthesis methods for automatic video trailer generation using LLMs and diffusion models.

Ax William Yicheng Zhu, Lei Zhu 25d ago

The Planetary Cost of AI Acceleration, Part II: The 10th Planetary Boundary and the 6.5-Year Countdown

Opinion piece on environmental and computational costs of scaling LLM agents and implications for planetary boundaries.

Ax Xinhong Xu, Yimeng Zhang, Qichen Qian, Yuanlong Zhang 25d ago

Self-Supervised Foundation Model for Calcium-imaging Population Dynamics

Self-supervised foundation model (CalM) trained on neuronal calcium traces for neuroscience task transfer learning.

Ax Sijun Dai, Qiang Huang, Xiaoxing You, Jun Yu 25d ago

MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation

Proposes MG²-RAG, a multi-granularity graph approach for retrieval-augmented generation in multimodal LLMs to improve cross-modal reasoning without costly text translation.

Ax Zimo Ji, Zongjie Li, Wenyuan Jiang, Yudong Gao, Shuai Wang 25d ago

Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

Independent evaluation of Claude Code's auto mode permission system for AI coding agents, testing security gates on ambiguous authorization scenarios.

Ax \'Ad\'am Kov\'acs 25d ago

Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents

Introduces Squeez, a method for pruning tool outputs in coding agents by identifying minimal relevant evidence blocks. Includes 11,477-example benchmark from SWE-bench.

Ax Ziheng Chen, Jiali Cheng, Zezhong Fan, Hadi Amiri, Yunzhi Yao, Xiangguo Sun, Yang Zhang 25d ago

CURE:Circuit-Aware Unlearning for LLM-based Recommendation

CURE enables privacy-preserving unlearning in LLM-based recommendation systems using circuit-aware techniques for removing user data.

Ax Yongchang Hao, Lili Mou 25d ago

Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling

Cactus improves speculative sampling for LLM decoding by relaxing strict distribution matching to allow acceptable variations like top-k sampling.

Ax Longsheng Zhou, Yu Shen 25d ago

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

Prune-Quantize-Distill pipeline for neural network compression optimizing wall-clock inference time rather than parameter count or FLOPs.

Ax Phongsakon Mark Konrad, Tim Lukas Adam, Riccardo Terrenzi, Serkan Ayvaz 25d ago

Architecture Without Architects: How AI Coding Agents Shape Software Architecture

Analysis of implicit architectural decisions made by AI coding agents, identifying five mechanisms and six prompt-architecture coupling patterns.

Ax Daniel Kuznetsov, Ofir Cohen, Karin Shistik, Rami Puzis, Asaf Shabtai 25d ago

FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment

FreakOut-LLM framework investigates whether emotionally charged prompts compromise safety alignment in ten LLMs using psychological stimuli.

Ax Rong Lu, Hao Liu, Song Hou 25d ago

Evaluation of Embedding-Based and Generative Methods for LLM-Driven Document Classification: Opportunities and Challenges

Comparative evaluation of embedding-based and generative models for document classification, showing Vision-Language Models with CoT achieve 82% zero-shot accuracy.

Ax Kai Yu, Shuang Zhou, Yiran Song, Zaifu Zhan, Jie Peng, Kaixiong Zhou, Tianlong Chen, Feng Xie, Meng Wang, Huazhu Fu, Mingquan Lin, Rui Zhang 25d ago

PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities

PRIME enables multimodal self-supervised pretraining for cancer prognosis with missing modalities by combining histopathology, gene expression, and reports.

Ax Elias Calboreanu 25d ago

Closed-Loop Autonomous Software Development via Jira-Integrated Backlog Orchestration: A Case Study in Deterministic Control and Safety-Constrained Automation

Case study of closed-loop software development system managing backlog via deterministic pipeline with Jira integration and safety constraints.

Ax Mehrdad Shoeibi, Elias Hossain, Ivan Garibay, Niloofar Yousefi 25d ago

Learning Stable Predictors from Weak Supervision under Distribution Shift

Studies learning from weak supervision under distribution shift in CRISPR-Cas13d experiments where guidance efficacy is indirectly inferred.

Ax Shuzhen Bi, Mingzi Zhang, Zhuoxuan Li, Xiaolong Wang, keqian Li, Aimin Zhou 25d ago

EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content

EduIllustrate benchmark evaluates LLMs on generating multimodal educational content combining accurate diagrams with step-by-step explanations.

Ax Jia Li, Yinfeng Yu 25d ago

Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction

BDATP framework for audio-visual navigation using binaural attention and action prediction to improve generalization in unseen 3D environments.

Ax Moeen AL-Makhlafi, Abdulrahman A. AlKannad, Eiad Almekhlafi, Nawaf Q. Othman Ahmed Mohammed, Saher Qaid 25d ago

YMIR: A new Benchmark Dataset and Model for Arabic Yemeni Music Genre Classification Using Convolutional Neural Networks

YMIR dataset and CNN model for classifying five Yemeni music genres, addressing underrepresentation of non-Western music in MIR research.

Ax Oteo Mamo, Olga Kogiou, Hyunjin Yi, Weikuan Yu 25d ago

Comparative Characterization of KV Cache Management Strategies for LLM Inference

Comparative analysis of key-value cache management strategies for efficient LLM inference under different model sizes and context lengths.

Ax Yingwei Ma, Yue Liu, Xinlong Yang, Yanhao Li, Kelin Fu, Yibo Miao, Yuchong Xie, Zhexu Wang, Shing-Chi Cheung 25d ago

Scaling Coding Agents via Atomic Skills

Proposes training LLM coding agents on five atomic coding skills (localization, editing, testing, reproduction, review) for improved generalization.

Ax StarVLA Community 25d ago

StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

StarVLA provides a modular open-source codebase for building vision-language-action embodied agents with standardized evaluation protocols.

Ax Gowrav Vishwakarma, Christopher J. Agostino 25d ago

Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space

Phase-Associative Memory is a recurrent sequence model using complex-valued representations achieving competitive perplexity on WikiText-103.

Ax Julia Chae, Nicholas Kolkin, Jui-Hsien Wang, Richard Zhang, Sara Beery, Cusuh Ham 25d ago

ID-Sim: An Identity-Focused Similarity Metric

ID-Sim proposes an identity-focused similarity metric for vision models to improve evaluation of personalized image generation tasks.

Ax Ankit Hemant Lade, Sai Krishna Jasti, Nikhil Sinha, Indar Kumar, Akanksha Tiwari 25d ago

PCA-Driven Adaptive Sensor Triage for Edge AI Inference

PCA-Triage is a streaming algorithm for adaptive sensor sampling in IoT networks using principal component analysis to manage bandwidth constraints.

Ax Hye Sun Yun, Geetika Kapoor, Michael Mackert, Ramez Kouzy, Wei Xu, Junyi Jessy Li, Byron C. Wallace 25d ago

This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA

Study evaluating LLM sensitivity to prompt phrasing in medical question answering, showing inconsistent responses despite identical underlying evidence.

Ax Annita Vapsi, Penghang Liu, Saheed Obitayo, Aakriti, Manoj Cherukumalli, Prathamesh Patil, Amit Varshney, Nicolas Marchesotti, Elizabeth Fons, Vamsi K. Potluru, Manuela Veloso 25d ago

Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series

DynLMC generates synthetic multivariate time series with time-varying correlations and cross-channel dependencies for training foundation models.

Ax Yifan Zhu, Yekai Pan, Yanghui Wu, Chen Ding 25d ago

AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels

arXiv paper presenting AutoLALA, open-source tool analyzing data locality in loop programs for HPC and AI workloads.

Ax MD Shafikul Islam, Mahathir Mohammad Bappy, Saifur Rahman Tushar, Md Arifuzzaman 25d ago

Feature-Aware Anisotropic Local Differential Privacy for Utility-Preserving Graph Representation Learning in Metal Additive Manufacturing

arXiv paper on privacy-preserving graph learning for additive manufacturing sensor data using differential privacy techniques.

Ax Danil Gorinevski (cybiont GmbH, Sch\"ubelbach, Switzerland) 25d ago