Isolater - Feed

Ax Franciskus Xaverius Erick, Johanna Paula M\"uller, Bernhard Kainz 23d ago

Geometry-Aware Uncertainty Coresets for Robust Visual In-Context Learning in Histopathology

Geometry-aware uncertainty coresets for robust few-shot in-context learning with vision-language models on histopathology images.

Ax Husnain Amjad, Raja Khurram Shahzad, Aamir Shahzad, Mehwish Fatima 23d ago

Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

Comprehensive survey of mathematical reasoning in LLMs covering benchmarks, architectures, evaluation methods, and open challenges.

Ax Amir Mousavi, Mohammad Sadegh Sirjani, Erfan Nourbakhsh, Mimi Xie, Rocky Slavin, Leslie Neely, John Davis, John Quarles 23d ago

MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data

MambaGaze framework using bidirectional Mamba for cognitive load assessment from eye-tracking data with missing data handling.

Ax Qian Kou, Xiaofeng Shi, Yulin Li, Xiaosong Qiu, Xinyang Wang, Hua Zhou, Cao Dongxing 23d ago

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

MechVQA benchmark for evaluating multimodal LLMs on mechanical engineering drawing understanding and spatial reasoning tasks.

Ax Juan Manuel Contreras 23d ago

An LLM-Native Psychometric Instrument Reveals a Self-Report--Behavior Gap Across 25 Models

LLM-native psychometric instrument reveals gap between model self-reports on personality dimensions and actual behavioral patterns across 25 models.

Ax Rui Melo, Riccardo Fogliato, Sean Zhou, Pratiksha Thaker, Zhiwei Steven Wu 23d ago

SEVRA-BENCH: Social Engineering of Vulnerabilities in Review Agents

SEVRA-BENCH benchmark testing vulnerability detection in LLM-based code review agents against social engineering attacks on pull requests.

Ax Kirill Vasilevski (Justina), Ximing Dong (Justina), Benjamin Rombaut (Justina), Milad Soltany (Justina), Ruochen Deng (Justina), Jiahuei Lin (Justina), Arthur Leung, Dayi Lin, Boyuan Chen, Shaowei Wang, Ahmed E. Hassan 23d ago

Beyond Correctness: Enhancing Architectural Reasoning in Code LLMs via Scalable Labeling with Agentic Judgment

Agentic judging pipeline using LLMs for scalable architectural evaluation of code, improving code LLM quality beyond functional correctness.

Ax Henry Bodwell, Hong Yang, John C. Simeone, Kelvin Gorospe, Bella Sullivan, Lana Huang, Jessica Gephart, Sandy Aylesworth, Molly Masterton, Naren Ramakrishnan 23d ago

IUU+DB: Tracking Illegal, Unreported, and Unregulated Fishing, Seafood Fraud, and Labor Abuse through LLM-driven Information Extraction

LLM-driven information extraction system for tracking illegal fishing, seafood fraud, and labor abuses in supply chains.

Ax Kaiyue Yang, Yuyan Bu, Jingwei Yi, Yuchi Wang, Biyu Zhou, Juntao Dai, Songlin Hu, Yaodong Yang 23d ago

When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

Study of LLM agent tool selection behavior revealing over-privileged tool escalation despite lower-privilege alternatives being sufficient.

Ax Junzhe Xu, Zecui Zeng, Lusong Li, Yuetong Fang, Renjing Xu 23d ago

A Neuromorphic Reinforcement Learning Framework for Efficient Pathfinding in Robotic Mobile Fulfillment Systems

Neuromorphic reinforcement learning framework for pathfinding optimization in robotic warehouse systems with real-time constraints.

Ax Pingchuan Ma, Zhaoyu Wang, Zimo Ji, Yuguang Zhou, Zhantong Xue, Zongjie Li, Shuai Wang, Xiaoqin Zhang 23d ago

AutoSpec: Safety Rule Evolution for LLM Agents via Inductive Logic Programming

AutoSpec framework for evolving safety rules in LLM agents via inductive logic programming, balancing interpretability with operational flexibility.

Ax Kwan Soo Shin, In Seok Kang, Yunkyung Min, Judy Yang, Munho Lee 23d ago

The Inattentional Gap: Task-Conditioned Language and Vision Models Omit the Safety-Critical Signals They Can Otherwise Report

Research showing task-conditioned language/vision models suppress reporting of safety-critical signals present in data, analogous to human inattentional blindness.

Ax Gerhard Backfried, Christian Schmidt, Diego Pilutti, Michael Suker 23d ago

Application of LLMs to Threat Assessment of Foreign Peacekeeping Missions

LLMs applied to threat extraction from media for peacekeeping mission risk assessment using OSINT collection and structured information mapping.

Ax Sayak Dutta 23d ago

CARVE: Content-Aware Recurrent with Value Efficiency for Chunk-Parallel Linear Attention

Efficient linear attention mechanism with content-aware recurrent updates and value efficiency for chunk-parallel processing.

Ax Katja Ber\v{c}i\v{c}, Slobodan Stanojevikj 23d ago

Categorizing Mathematical Concepts with LLM Voting Ensembles in Mathswitch

Open-source Mathswitch project uses LLM voting ensembles to categorize and link mathematical concepts from multiple sources.

Ax Kathan Shah 23d ago

Token Geometry

Ember optimizer exploits embedding table geometry to improve LLM fine-tuning, RL, and pretraining with minimal optimizer state.

Ax Vadym Hadetskyi, Dario Pasquini, Artem Sorokin 23d ago

Not All Refusals Are Equal: How Safety Alignment Fails Cybersecurity at Scale

Study of how safety alignment in LLMs creates overly restrictive constraints in cybersecurity domains, examining domain-specific refusal patterns.

Ax Zijun Xie, Yuyang You, Yongzhi Li, Enlei Gong, Zeyu Chen, Quan Chen, Yanhua Cheng, Peng Jiang, Yadong Mu 23d ago

ACPO: Adaptive Credit Policy Optimization via Fine-Grained Surrogate Entropy

Fine-grained entropy-based method for improving token-level credit assignment in RL-trained LLMs, addressing sparse reward challenges.

Ax Hyunsoo Lee, Panggah Prabawa, Dae-Hyun Choi, Joongheon Kim 23d ago

Hierarchical Multi-Agent Reinforcement Learning for Carbon-Aware AI Data Centers in Power Distribution Systems

Multi-agent reinforcement learning framework for energy management in AI data centers with carbon-awareness optimization.

Ax Jiwon Kang, Heeji Yoon, Jaewoo Jung, Jaewon Min, Minkyeong Jeon, Biyeon Hwang, Sangwon Jung, Seungryong Kim 23d ago

Transferability Between Understanding and Generation in Unified Multimodal Models

Study of transferability between image understanding and generation tasks in unified multimodal model architectures.

Ax Tianxing Chen, Yue Chen, Zixuan Li, Junyuan Tang, Kailun Su, Haoran Lu, Weijie Wan, Baijun Chen, Songling Liu, Haowen Yan, Honghao Su, Zhiyang Dou, Kaixuan Wang, Dandan Zhang, Yunze Liu, Yan Qin, Qiwei Liang, Qiwei Wu, Zijian Lin, Wenwei Lin, Yuran Wang, Minghua He, Tianshu Wu, Ruihai Wu, Jingquan Zhou, Kai-Chong Lei, Haibao Yu, Yuanfeng Ji, Weiyang Jin, Guanyu Lin, Xiaofan Li, Qi Xiong, Renjing Xu, Zhongyu Li, Wenhao Chai, Enze Xie, Ziwei Wang, Yao Mu, Hao Dong, Wojciech Matusik, Mingyu Ding, Wenbo Ding, Ping Luo, Masayoshi Tomizuka 23d ago

RoboDojo: A Unified Sim-and-Real Benchmark for Comprehensive Evaluation of Generalist Robot Manipulation Policies

RoboDojo: unified sim-and-real benchmark for evaluating generalist robot manipulation policies across diverse tasks.

Ax Lianghua Huang, Zhi-Fan Wu, Yupeng Shi, Wei Wang, Mengyang Feng, Junjie He, Chen-Wei Xie, Yu Liu, Jingren Zhou, Ang Wang, Bang Zhang, Baole Ai, Chen Liang, Cheng Yu, Chongyang Zhong, Jinwei Qi, Kai Zhu, Pandeng Li, Peng Zhang, Wenyuan Zhang, Xinhua Cheng, Yitong Huang, Yun Zheng, Yuxiang Bao, Yuzheng Wang, Zoubin Bi 23d ago

Wan-Streamer v0.2: Higher Resolution, Same Latency

Wan-Streamer v0.2: upgraded streaming audio-visual interaction model achieving 640x368 resolution at 200ms latency.

Ax Mouhamed Amine Bouchiha, Gregory Blanc 23d ago

TACTIC-KG: Toward Small Agent Teams for Cyber Threat Intelligence Knowledge Graph Construction

TACTIC-KG: multi-agent LLM system constructing cyber threat intelligence knowledge graphs from unstructured CTI reports.

Ax Zhifeng Kong, Sang-gil Lee, Jaehyeon Kim, Boxin Wang, Zihan Liu, Sungwon Kim, Yang Chen, Arushi Goel, Rajarshi Roy, Wenliang Dai, Zhuolin Yang, Yangyi Chen, Dongfu Jiang, Sreyan Ghosh, Tuomas Rintamaki, Andrew Tao, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping 23d ago

Unified Audio Intelligence Without Regressing on Text Intelligence

Audex: unified audio-text LLM built on Nemotron MoE encoding audio into text embedding space without degrading text performance.

Ax Anthony Hu, V\'aclav Volhejn, Adrien Ramanana Rahary, Chris Mulder, Aditya Makkar, Alyx Liao, Am\'elie Royer, Manu Orsini, Adam Jelley, Eloi Alonso, Florian Laurent, Fredrik Nor\'en, James Swingos, Jan H\"unermann, Kent Rollins, Lucas Hosseini, Matthieu Le Cauchois, Maxim Peter, Pim de Witte, Tim Brown, Vincent Micheli, Moritz B\"ohle, Gabriel de Marmiesse, Viktoriia Sharmanska, Lucia Specia, Michael Black, Patrick P\'erez 23d ago

Multiplayer Interactive World Models with Representation Autoencoders

First multiplayer world model for dynamic environments conditioning on multiple agents' action streams with scene coherence.

Ax Lorenzo Tarricone, Helen E. Eisenach, Aiko Muraishi, Charlotte M. Deane 23d ago

Design-CP: Context Parallelism for Design of Protein Nanoparticles

Design-CP: context parallel inference strategies for RFdiffusion 3 protein design using row/grid sharding with ring attention.

Ax Hao Hu, Xue-shan Ai 23d ago

Exogenous Dropout: A Simple, Strong Baseline for Corruption-Robust Time Series Forecasting with Covariates

Exogenous dropout technique for robust time series forecasting with noisy or missing exogenous covariates.

Ax Anis Hamadouche, Amir Hussain 23d ago

Empirical Minimal-Realisation Compression of Deep Neural Networks via Controllability-Observability Tests

DNN compression via controllability-observability framework for state-order reduction treating networks as dynamical systems.

Ax Haiwen Yi, Xinyuan Song 23d ago

Learning to Control LLM Agent Harnesses with Offline Reinforcement Learning

Method for learning to control LLM agent execution harnesses via offline reinforcement learning while keeping LLM frozen.

Ax Linjie Xu, David Wipf 23d ago

Parameter-Free Encoders Remain Viable for RDB Foundation Models

Study of parameter-free encoders for relational database foundation models to predict missing values across varied prediction tasks.

Ax Guangyuan Wu, Weining Cao, Zehui Tan, Yuan Yao, Hengfeng Wei, Taolue Chen, Xiaoxing Ma 23d ago

InvWeaver: Deductive Feedback for Invariant Synthesis in Interacting-Loop Programs

InvWeaver: neuro-symbolic framework using LLMs for loop invariant synthesis in multi-loop programs via deductive feedback.

Ax Zhaoyu Bai, Jiaqi Cai 23d ago

PatchOptic for Shared-State LLM Workflows with Projected Views and Verified Structured Updates

PatchOptic system for LLM agentic workflows using projected views and verified structured updates over shared state.

Ax Muhammad Zain Amin, Kibele Sebnem Yildirim 23d ago

Self-Review Reinforcement Learning (SRRL) with Cross-Episode Memory and Policy Distillation

Self-review RL with cross-episode memory for training LLMs with sparse/delayed feedback and policy distillation.

Ax Omar Al-Refai, Ibrahim Shahbaz, Adam Ali Husseinat, Eman Hammad 23d ago

Federated Physics-Grounded Reinforcement Learning for Distributed Stability Control in Smart Grids

FedPPO-PG combines federated learning with physics-grounded PPO for multi-agent stability control in smart grids.

Ax Xinrui He, Mengting Ai, Junting Wang, Curtiss B. Cook, Jingrui He 23d ago

SafeImpute: Reliable Clinical Data Imputation via Conformal Selection

Conformal prediction method for reliable clinical data imputation with uncertainty quantification for high-stakes decisions.

Ax Nima Eshraghi, Lovedeep Gondara, Yuqing Huang, Sagarika Suresh, Leizer Teran, Jithin Pradeep, Xiaotong Xu, Fanny Chevalier 23d ago

A Coin Flip Per Token: Bernoulli Sparse Steering of Large Language Models

Stochastic sparse token steering for LLMs using probabilistic gating to reduce per-token perturbation overhead.

Ax Katherine Avery, Bruno Castro da Silva, David Jensen 23d ago

Safe Bayesian Optimization with Counterfactual Policies

Safe Bayesian optimization with counterfactual policies ensuring interventions don't degrade outcomes below baselines.

Ax Taniya Shaji, Abhay Sobhanan, Christof Defryn 23d ago

Deep Reinforcement Learning for Dynamic Battery Management of Autonomous Order Pickers

Deep reinforcement learning for autonomous mobile robot battery charging optimization in warehouse environments.

Ax Kabir Dev Paul Baghel, Radu Timofte, Dmitry Ignatov 23d ago

LLM-Driven Neural Network Generation with Same-Family Architecture Guidance: Disentangling Transfer and Adaptation

LLM-guided generation of neural network architecture improvements using same-family source models as guidance.

Ax Bowen Xue, Zihan Min, Xingyang Li, Zhekai Zhang, Haocheng Xi, Lvmin Zhang, Maneesh Agrawala, Jun-Yan Zhu, Song Han, Yujun Lin, Muyang Li 23d ago

FourTune: Towards Fully 4-Bit Efficient Post-Training for Diffusion Models

FourTune: 4-bit quantization method for efficient post-training fine-tuning of diffusion models with low memory overhead.

Ax Qi Zhao, Christian Wressnegger 23d ago

Two Sides of the Same Coin: Learning the Backdoor to Remove the Backdoor

Training-time defense against neural backdoor attacks by learning to distinguish poisoned samples from benign ones.

Ax Zhiyuan Chen, Jing Hu, Junzhe Wang, Yueyang Huang, Xinyi Yang, Zhaoyang Wang, Feng Zhu 23d ago

AbICL: In-Context Learning for Antigen-Specific Antibody Affinity Ranking

In-context learning applied to antibody affinity ranking for drug discovery by capturing antigen-specific binding landscapes.

Ax Shuze Daniel Liu, Claire Chen, Jiabao Sean Xiao, Xin Chen, David Simchi-Levi 23d ago

Strategic Bargaining in Multi-Buyer Markets: Reinforcement Learning from Verifiable Rewards for LLM Negotiations

arXiv paper on multi-buyer negotiation using reinforcement learning with LLMs for strategic bargaining with private information.

Ax Pan Li, Kai Chen, Shuai Chang, Shengzhi Zhang, Peizhuo Lv, Jinwen He 23d ago

Differentially Private Natural Gradient Descent

arXiv paper on differentially private natural gradient descent improving optimization efficiency under privacy constraints.

Ax Noel Thomas 23d ago

No Subspace to Track: Non-Identifiability and Optimizer State in Low-Rank Training

arXiv paper demonstrating non-identifiability of subspaces in low-rank LLM training, challenging GaLore optimizer assumptions.

Ax Sahasrajit Sarmasarkar, Anastasia Koloskova, Sanmi Koyejo 23d ago

Auditing of Unlearning Algorithms

arXiv paper proposing practical auditing of unlearning algorithms using membership inference attacks to verify data removal.

Ax Jean-Francois Bonbhel 23d ago

K-ABENA: K-Adaptive Backpropagation with Error-based N-exclusion Algorithm : (Compensated Loss-Based Sample Exclusion with Unbiased Gradient Estimation)

arXiv paper introducing K-ABENA, a selective gradient computation framework reducing training costs via sample exclusion and unbiased estimation.

Ax Chenyu Zhou 23d ago