Isolater - Feed

Ax Song-Ju Kim 25d ago

Contextuality as an External Bookkeeping Cost under Fixed Shared-State Semantics

Theoretical analysis of contextuality in quantum information systems as external bookkeeping cost under classical simulation.

Ax Uday Allu, Sonu Kedia, Tanmay Odapally, Biddwan Ahmed 25d ago

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

Proposes Web Retrieval-Aware Chunking (W-RAC) for efficient RAG document chunking to balance retrieval quality, latency, and cost on web-scale content.

Ax Jiaquan Zhang, Qigan Sun, Chaoning Zhang, Xudong Wang, Zhenzhen Huang, Yitian Zhou, Pengcheng Zheng, Chi-lok Andy Tai, Sung-Ho Bae, Zeyu Ma, Caiyan Qin, Jinyu Guo, Yang Yang, Hengtao Shen 25d ago

TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models

Proposes Task-Driven Alignment (TDA-RC) for improving reasoning chains in LLMs by bridging logical gaps between CoT and multi-round thought paradigms.

Ax Julian Coda-Forno, Jane X. Wang, Arslan Chaudhry 25d ago

The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse

Evaluates bidirectional training objectives (MLM, masked attention) to mitigate the reversal curse in autoregressive language models.

Ax Mohammad Reza Ghasemi Madani, Soyeon Caren Han, Shuo Yang, Jey Han Lau 25d ago

Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

Introduces Inclusion-of-Thoughts (IoT), a strategy to reduce LLM instability on multiple-choice questions by filtering irrelevant distractors.

Ax Nitish Kumar, Sannu Kumar, S Akash, Manish Gupta, Ankith Karat, Sriparna Saha 25d ago

SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs

Proposes SUMMIR framework for ranking sports insights extracted by LLMs, addressing hallucinations with 7,900-article dataset across four sports.

Ax Jos\'e Guilherme Marques dos Santos, Ricardo Yang, Rui Humberto Pereira, Alexandre Sousa, Br\'igida M\'onica Faria, Henrique Lopes Cardoso, Jos\'e Duarte, Jos\'e Lu\'is Reis, Lu\'is Paulo Reis, Pedro Pimenta, Jos\'e Paulo Marques dos Santos 25d ago

From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering

Evaluates four open-source PDF-to-Markdown conversion frameworks (Docling, MinerU, Marker, DeepSeek OCR) for RAG document preprocessing impact on QA accuracy.

Ax Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen 25d ago

Learning to Retrieve from Agent Trajectories

Studies how to design information retrieval systems for LLM agents versus humans, proposing learning-to-rank methods for agent trajectories.

Ax Muhammad Tahir Ashraf 25d ago

Synthetic Trust Attacks: Modeling How Generative AI Manipulates Human Decisions in Social Engineering Fraud

Analysis of how generative AI enables social engineering fraud and trust manipulation attacks in financial crime scenarios.

Ax Abhishek Dharmaratnakar, Srivaths Ranganathan, Debanshu Das, Anushree Sinha 25d ago

Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity

Surveys transition from heuristic-based to generative synthesis methods for automatic video trailer generation using LLMs and diffusion models.

Ax William Yicheng Zhu, Lei Zhu 25d ago

The Planetary Cost of AI Acceleration, Part II: The 10th Planetary Boundary and the 6.5-Year Countdown

Opinion piece on environmental and computational costs of scaling LLM agents and implications for planetary boundaries.

Ax Xinhong Xu, Yimeng Zhang, Qichen Qian, Yuanlong Zhang 25d ago

Self-Supervised Foundation Model for Calcium-imaging Population Dynamics

Self-supervised foundation model (CalM) trained on neuronal calcium traces for neuroscience task transfer learning.

Ax Sijun Dai, Qiang Huang, Xiaoxing You, Jun Yu 25d ago

MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation

Proposes MG²-RAG, a multi-granularity graph approach for retrieval-augmented generation in multimodal LLMs to improve cross-modal reasoning without costly text translation.

Ax Zimo Ji, Zongjie Li, Wenyuan Jiang, Yudong Gao, Shuai Wang 25d ago

Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

Independent evaluation of Claude Code's auto mode permission system for AI coding agents, testing security gates on ambiguous authorization scenarios.

Ax \'Ad\'am Kov\'acs 25d ago

Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents

Introduces Squeez, a method for pruning tool outputs in coding agents by identifying minimal relevant evidence blocks. Includes 11,477-example benchmark from SWE-bench.

Ax Ziheng Chen, Jiali Cheng, Zezhong Fan, Hadi Amiri, Yunzhi Yao, Xiangguo Sun, Yang Zhang 25d ago

CURE:Circuit-Aware Unlearning for LLM-based Recommendation

CURE enables privacy-preserving unlearning in LLM-based recommendation systems using circuit-aware techniques for removing user data.

Ax Yongchang Hao, Lili Mou 25d ago

Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling

Cactus improves speculative sampling for LLM decoding by relaxing strict distribution matching to allow acceptable variations like top-k sampling.

Ax Longsheng Zhou, Yu Shen 25d ago

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

Prune-Quantize-Distill pipeline for neural network compression optimizing wall-clock inference time rather than parameter count or FLOPs.

Ax Phongsakon Mark Konrad, Tim Lukas Adam, Riccardo Terrenzi, Serkan Ayvaz 25d ago

Architecture Without Architects: How AI Coding Agents Shape Software Architecture

Analysis of implicit architectural decisions made by AI coding agents, identifying five mechanisms and six prompt-architecture coupling patterns.

Ax Daniel Kuznetsov, Ofir Cohen, Karin Shistik, Rami Puzis, Asaf Shabtai 25d ago

FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment

FreakOut-LLM framework investigates whether emotionally charged prompts compromise safety alignment in ten LLMs using psychological stimuli.

Ax Rong Lu, Hao Liu, Song Hou 25d ago

Evaluation of Embedding-Based and Generative Methods for LLM-Driven Document Classification: Opportunities and Challenges

Comparative evaluation of embedding-based and generative models for document classification, showing Vision-Language Models with CoT achieve 82% zero-shot accuracy.

Ax Kai Yu, Shuang Zhou, Yiran Song, Zaifu Zhan, Jie Peng, Kaixiong Zhou, Tianlong Chen, Feng Xie, Meng Wang, Huazhu Fu, Mingquan Lin, Rui Zhang 25d ago

PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities

PRIME enables multimodal self-supervised pretraining for cancer prognosis with missing modalities by combining histopathology, gene expression, and reports.

Ax Elias Calboreanu 25d ago

Closed-Loop Autonomous Software Development via Jira-Integrated Backlog Orchestration: A Case Study in Deterministic Control and Safety-Constrained Automation

Case study of closed-loop software development system managing backlog via deterministic pipeline with Jira integration and safety constraints.

Ax Mehrdad Shoeibi, Elias Hossain, Ivan Garibay, Niloofar Yousefi 25d ago

Learning Stable Predictors from Weak Supervision under Distribution Shift

Studies learning from weak supervision under distribution shift in CRISPR-Cas13d experiments where guidance efficacy is indirectly inferred.

Ax Shuzhen Bi, Mingzi Zhang, Zhuoxuan Li, Xiaolong Wang, keqian Li, Aimin Zhou 25d ago

EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content

EduIllustrate benchmark evaluates LLMs on generating multimodal educational content combining accurate diagrams with step-by-step explanations.

Ax Jia Li, Yinfeng Yu 25d ago

Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction

BDATP framework for audio-visual navigation using binaural attention and action prediction to improve generalization in unseen 3D environments.

Ax Moeen AL-Makhlafi, Abdulrahman A. AlKannad, Eiad Almekhlafi, Nawaf Q. Othman Ahmed Mohammed, Saher Qaid 25d ago

YMIR: A new Benchmark Dataset and Model for Arabic Yemeni Music Genre Classification Using Convolutional Neural Networks

YMIR dataset and CNN model for classifying five Yemeni music genres, addressing underrepresentation of non-Western music in MIR research.

Ax Oteo Mamo, Olga Kogiou, Hyunjin Yi, Weikuan Yu 25d ago

Comparative Characterization of KV Cache Management Strategies for LLM Inference

Comparative analysis of key-value cache management strategies for efficient LLM inference under different model sizes and context lengths.

Ax Yingwei Ma, Yue Liu, Xinlong Yang, Yanhao Li, Kelin Fu, Yibo Miao, Yuchong Xie, Zhexu Wang, Shing-Chi Cheung 25d ago

Scaling Coding Agents via Atomic Skills

Proposes training LLM coding agents on five atomic coding skills (localization, editing, testing, reproduction, review) for improved generalization.

Ax StarVLA Community 25d ago

StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

StarVLA provides a modular open-source codebase for building vision-language-action embodied agents with standardized evaluation protocols.

Ax Gowrav Vishwakarma, Christopher J. Agostino 25d ago

Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space

Phase-Associative Memory is a recurrent sequence model using complex-valued representations achieving competitive perplexity on WikiText-103.

Ax Julia Chae, Nicholas Kolkin, Jui-Hsien Wang, Richard Zhang, Sara Beery, Cusuh Ham 25d ago

ID-Sim: An Identity-Focused Similarity Metric

ID-Sim proposes an identity-focused similarity metric for vision models to improve evaluation of personalized image generation tasks.

Ax Ankit Hemant Lade, Sai Krishna Jasti, Nikhil Sinha, Indar Kumar, Akanksha Tiwari 25d ago

PCA-Driven Adaptive Sensor Triage for Edge AI Inference

PCA-Triage is a streaming algorithm for adaptive sensor sampling in IoT networks using principal component analysis to manage bandwidth constraints.

Ax Hye Sun Yun, Geetika Kapoor, Michael Mackert, Ramez Kouzy, Wei Xu, Junyi Jessy Li, Byron C. Wallace 25d ago

This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA

Study evaluating LLM sensitivity to prompt phrasing in medical question answering, showing inconsistent responses despite identical underlying evidence.

Ax Annita Vapsi, Penghang Liu, Saheed Obitayo, Aakriti, Manoj Cherukumalli, Prathamesh Patil, Amit Varshney, Nicolas Marchesotti, Elizabeth Fons, Vamsi K. Potluru, Manuela Veloso 25d ago

Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series

DynLMC generates synthetic multivariate time series with time-varying correlations and cross-channel dependencies for training foundation models.

Ax Yifan Zhu, Yekai Pan, Yanghui Wu, Chen Ding 25d ago

AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels

arXiv paper presenting AutoLALA, open-source tool analyzing data locality in loop programs for HPC and AI workloads.

Ax MD Shafikul Islam, Mahathir Mohammad Bappy, Saifur Rahman Tushar, Md Arifuzzaman 25d ago

Feature-Aware Anisotropic Local Differential Privacy for Utility-Preserving Graph Representation Learning in Metal Additive Manufacturing

arXiv paper on privacy-preserving graph learning for additive manufacturing sensor data using differential privacy techniques.

Ax Danil Gorinevski (cybiont GmbH, Sch\"ubelbach, Switzerland) 25d ago

Nidus: Externalized Reasoning for AI-Assisted Engineering

arXiv paper on Nidus, a governance runtime using Claude, Gemini, Codex to mechanize V-model for AI-assisted software delivery.

Ax Firoj Alam, Gagan Bhatia, Sahinur Rahman Laskar, Shammur Absar Chowdhury 25d ago

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation

arXiv paper proposing OmniScore, deterministic evaluation metrics for multilingual text generation as alternative to LLM judges.

Ax Amir M. Ebrahimi, Gopi Krishnan Rajbahadur 25d ago

Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks

arXiv paper auditing code-editing benchmarks for LLMs, finding flaws in existing evaluation methods for instructed code modification.

Ax Jorge Alberto Garza-Abdala, Gerardo A. Fumagal-Gonz\'alez, Eduardo de Avila-Armenta, Sadam Hussain, Jasiel H. Toscano-Mart\'inezb, Diana S. M. Rosales Gurmendi, Alma A. Pedro-P\'erez, Jose G. Tamez-Pena 25d ago

Simultaneous Dual-View Mammogram Synthesis Using Denoising Diffusion Probabilistic Models

arXiv paper on diffusion models for medical imaging, generating paired mammogram views for cancer screening datasets.

Ax Andrei Polubarov, Lyubaykin Nikita, Alexander Derevyagin, Artyom Grishin, Igor Saprygin, Aleksandr Serkov, Mark Averchenko, Daniil Tikhonov, Maksim Zhdanov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Alexey Zemtsov, Vladislav Kurenkov 25d ago

Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

arXiv paper on Decision Pre-Trained Transformer for in-context reinforcement learning, enabling scalable generalist agent training.

Ax Zezhong Fan, Ziheng Chen, Luyi Ma, Jin Huang, Lalitesh Morishetti, Kaushiki Nag, Sushant Kumar, Kannan Achan 25d ago

CRAB: Codebook Rebalancing for Bias Mitigation in Generative Recommendation

arXiv paper on CRAB method for mitigating popularity bias in generative recommendation systems via codebook rebalancing.

Ax Quyet V. Do, Thinh Pham, Nguyen Nguyen, Sha Li, Pratibha Zunjare, Tu Vu 25d ago

$\pi^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models

arXiv paper presenting π² pipeline for curating reasoning data from structured sources to improve LLM long-context reasoning.

Ax Yuxuan Zhang, EunJeong Hwang, Huaisong Zhang, Penghui Du, Yiming Jia, Dongfu Jiang, Xuan He, Shenhui Zhang, Ping Nie, Peter West, Kelsey R. Allen 25d ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

arXiv paper on vision-language models learning from grounded video data, finding text-only bias in video benchmarks.

Ax Ruslan Sharifullin, Maxim Gorshkov, Hannah Clay 25d ago

Offline RL for Adaptive Policy Retrieval in Prior Authorization

arXiv paper modeling prior authorization policy retrieval as MDP for adaptive decision-making in healthcare insurance.

Ax Lucas Dionisopoulos, Nicklas Majamaki, Prithviraj Ammanabrolu 25d ago

Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

arXiv paper on how reasoning evolves in language models through fine-tuning and RL, studied via chess task performance.

Ax Samira Hajizadeh, Suman Jana 25d ago

EffiPair: Improving the Efficiency of LLM-generated Code with Relative Contrastive Feedback

EffiPair: Relative Contrastive Feedback method for improving runtime and memory efficiency of LLM-generated code without model fine-tuning.

Ax Geert Trooskens (XY.AI Labs, Palo Alto, CA), Aaron Karlsberg (XY.AI Labs, Palo Alto, CA), Anmol Sharma (XY.AI Labs, Palo Alto, CA), Lamara De Brouwer (XY.AI Labs, Palo Alto, CA), Max Van Puyvelde (Stanford University School of Medicine, Stanford, CA), Matthew Young (XY.AI Labs, Palo Alto, CA), John Thickstun (Cornell University, Ithaca, NY), Gil Alterovitz (Brigham and Women's Hospital / Harvard Medical School, Boston, MA), Walter A. De Brouwer (Stanford University School of Medicine, Stanford, CA) 25d ago

Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation

Compiled AI: Paradigm where LLMs generate executable code during compilation for deterministic, model-free workflow automation execution.

Ax Alfonso Amayuelas, Firas Laakom, Piotr Pi\k{e}kos, Wenyi Wang, Yifan Xu, Yuhui Wang, J\"urgen Schmidhuber, William Wang 25d ago

Planning to Explore: Curiosity-Driven Planning for LLM Test Generation

Planning to Explore: Curiosity-driven planning approach for LLM-based test generation using Bayesian principles to reach deep code branches.