Ax Jos\'e Guilherme Marques dos Santos, Ricardo Yang, Rui Humberto Pereira, Alexandre Sousa, Br\'igida M\'onica Faria, Henrique Lopes Cardoso, Jos\'e Duarte, Jos\'e Lu\'is Reis, Lu\'is Paulo Reis, Pedro Pimenta, Jos\'e Paulo Marques dos Santos 25d ago

From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering

Evaluates four open-source PDF-to-Markdown conversion frameworks (Docling, MinerU, Marker, DeepSeek OCR) for RAG document preprocessing impact on QA accuracy.

Ax Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen 25d ago

Learning to Retrieve from Agent Trajectories

Studies how to design information retrieval systems for LLM agents versus humans, proposing learning-to-rank methods for agent trajectories.

Ax Ziheng Chen, Jiali Cheng, Zezhong Fan, Hadi Amiri, Yunzhi Yao, Xiangguo Sun, Yang Zhang 25d ago

CURE:Circuit-Aware Unlearning for LLM-based Recommendation

CURE enables privacy-preserving unlearning in LLM-based recommendation systems using circuit-aware techniques for removing user data.

Ax Yingwei Ma, Yue Liu, Xinlong Yang, Yanhao Li, Kelin Fu, Yibo Miao, Yuchong Xie, Zhexu Wang, Shing-Chi Cheung 25d ago

Scaling Coding Agents via Atomic Skills

Proposes training LLM coding agents on five atomic coding skills (localization, editing, testing, reproduction, review) for improved generalization.

Ax Julia Chae, Nicholas Kolkin, Jui-Hsien Wang, Richard Zhang, Sara Beery, Cusuh Ham 25d ago

ID-Sim: An Identity-Focused Similarity Metric

ID-Sim proposes an identity-focused similarity metric for vision models to improve evaluation of personalized image generation tasks.

Ax Ankit Hemant Lade, Sai Krishna Jasti, Nikhil Sinha, Indar Kumar, Akanksha Tiwari 25d ago

PCA-Driven Adaptive Sensor Triage for Edge AI Inference

PCA-Triage is a streaming algorithm for adaptive sensor sampling in IoT networks using principal component analysis to manage bandwidth constraints.

Ax Annita Vapsi, Penghang Liu, Saheed Obitayo, Aakriti, Manoj Cherukumalli, Prathamesh Patil, Amit Varshney, Nicolas Marchesotti, Elizabeth Fons, Vamsi K. Potluru, Manuela Veloso 25d ago

Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series

DynLMC generates synthetic multivariate time series with time-varying correlations and cross-channel dependencies for training foundation models.

Ax Andrei Polubarov, Lyubaykin Nikita, Alexander Derevyagin, Artyom Grishin, Igor Saprygin, Aleksandr Serkov, Mark Averchenko, Daniil Tikhonov, Maksim Zhdanov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Alexey Zemtsov, Vladislav Kurenkov 25d ago

Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

arXiv paper on Decision Pre-Trained Transformer for in-context reinforcement learning, enabling scalable generalist agent training.

Ax Geert Trooskens (XY.AI Labs, Palo Alto, CA), Aaron Karlsberg (XY.AI Labs, Palo Alto, CA), Anmol Sharma (XY.AI Labs, Palo Alto, CA), Lamara De Brouwer (XY.AI Labs, Palo Alto, CA), Max Van Puyvelde (Stanford University School of Medicine, Stanford, CA), Matthew Young (XY.AI Labs, Palo Alto, CA), John Thickstun (Cornell University, Ithaca, NY), Gil Alterovitz (Brigham and Women's Hospital / Harvard Medical School, Boston, MA), Walter A. De Brouwer (Stanford University School of Medicine, Stanford, CA) 25d ago

Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation

Compiled AI: Paradigm where LLMs generate executable code during compilation for deterministic, model-free workflow automation execution.