Isolater - Feed

Ax Adam Bayley, Xiaodan Zhu, Raquel Aoki, Yanshuai Cao, Kevin H. Wilson 29d ago

Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits

Theoretical and empirical evaluation of using LLM-generated preferences to warm-start contextual bandits, examining alignment with actual user preferences.

Ax Kamalasankari Subramaniakuppusamy, Jugal Gajjar 29d ago

Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?

Analysis of stability in post-hoc feature attribution methods for vision systems under input perturbations, introducing evaluation suite.

Ax Murtuza Shahzad, Joseph Wilson, Ibrahim Al Azher, Hamed Alhoori, Mona Rahimi 29d ago

From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks

LLM-based code generation for security vulnerabilities using CAPEC and CWE frameworks, addressing gaps in existing vulnerability datasets.

Ax Lingjun Zhao, Dayeon Ki, Marine Carpuat, Hal Daum\'e III 29d ago

Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation

Study of cultural bias in LLM text generation, introducing task of culturally-adapted artwork descriptions for different audience groups.

Ax Jackson G. Lu, Gerui Gloria Zhao, Anna Manyi Zheng 29d ago

Generative AI Use in Entrepreneurship: An Integrative Review and an Empowerment-Entrapment Framework

Integrative review of generative AI impact on entrepreneurship across opportunity recognition, evaluation, resource assembly, and venture launch stages.

Ax John T. Halloran 29d ago

Understanding the Effects of Safety Unalignment on Large Language Models

Research on safety alignment vulnerabilities in LLMs, examining jailbreak-tuning and weight orthogonalization methods that can disable safety guardrails.

Ax Sahaj Singh Maini, Robert L. Goldstone, Zoran Tiganj 29d ago

High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination

Comparative study of LLM vs human coordination in group games, revealing volatility and action bias differences in adaptive strategies.

Ax Ethan Reid 29d ago

Moondream Segmentation: From Words to Masks

Vision-language model extension for referring image segmentation using autoregressive decoding and reinforcement learning refinement.

Ax Hita Kambhamettu, Will Crichton, Sean Welleck, Harrison Goldstein, Andrew Head 29d ago

Making Written Theorems Explorable by Grounding Them in Formal Representations

System grounding LLM-generated explanations in formal representations to enable interactive exploration of mathematical proofs.

Ax Hita Kambhamettu, Bhavana Dalvi Mishra, Andrew Head, Jonathan Bragg, Aakanksha Naik, Joseph Chee Chang, Pao Siangliulue 29d ago

LitPivot: Developing Well-Situated Research Ideas Through Dynamic Contextualization and Critique within the Literature Landscape

Tool for developing research ideas through dynamic literature contextualization and critique using LLMs.

Ax Wei Zou, Mingwen Dong, Miguel Romero Calvo, Wei Zou, Shuaichen Chang, Jiang Guo, Dongkyu Lee, Xing Niu, Xiaofei Ma, Yanjun Qi, Jiarong Jiang 29d ago

Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

Security analysis of memory-based LLM web agents, demonstrating environment-injected poisoning attacks through persistent memory exploitation.

Ax Hao Li, Liwei Zou, Wenping Yin, Gulsen Taskin, Naoto Yokoya, Danfeng Hong, Wufan Zhao 29d ago

Smart Transfer: Leveraging Vision Foundation Model for Rapid Building Damage Mapping with Post-Earthquake VHR Imagery

Vision foundation model applied to rapid building damage mapping from post-earthquake imagery for disaster response.

Ax Lei Song, Shihan Guan, Youyong Kong 29d ago

Analytic Drift Resister for Non-Exemplar Continual Graph Learning

Continual graph learning method addressing feature drift in non-exemplar settings using analytic continual learning.

Ax Weimin Liu, Jiyuan Qiu, Wenjun Wang, Joshua H. Meng 29d ago

Cross-Vehicle 3D Geometric Consistency for Self-Supervised Surround Depth Estimation on Articulated Vehicles

Self-supervised depth estimation for articulated vehicles using cross-vehicle 3D geometric consistency.

Ax Shufan Jiang, Chios Chen, Zhiyang Chen 29d ago

GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

Game benchmark with 124 bugs for evaluating LLMs' ability to autonomously discover bugs as QA engineers in dynamic environments.

Ax Cunyang Wei, Siddharth Singh, Aishwarya Sarkar, Daniel Nichols, Tisha Patel, Aditya K. Ranjan, Sayan Ghosh, Ali Jannesari, Nathan R. Tallent, Abhinav Bhatele 29d ago

Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training

Distributed training approach for graph neural networks using communication-free sampling and hybrid parallelism.

Ax Haruhi Shida, Koo Imai, Keigo Kansa 29d ago

Generalization Limits of Reinforcement Learning Alignment

Theoretical analysis of reinforcement learning alignment limitations in LLMs, demonstrating generalization failures through compound jailbreak attacks.

Ax Farhad Pourkamali-Anaraki 29d ago

Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration

Efficient model compression using randomized subspace iteration for low-rank decomposition of pretrained models.

Ax Vira Kasprova, Amruta Parulekar, Abdulrahman AlRabah, Krishna Agaram, Ritwik Garg, Sagar Jha, Nimet Beyza Bozdag, Dilek Hakkani-Tur 29d ago

Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems

Study of sycophancy propagation in multi-agent LLM systems, examining how agents' awareness of others' biases affects collaborative discussions.

Ax Kavana Venkatesh, Jiaming Cui 29d ago

Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems

Large-scale empirical study of coordination dynamics in LLM multi-agent systems, analyzing scaling behavior and power laws in collective cognition.

Ax Yao Zhao, Zhiyue Zhang, Yanxun Xu 29d ago

Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis

Agentic framework using LLMs for automated clinical trial evidence synthesis and meta-analysis with eligibility-aware study selection.

Ax Matthew Levinson 29d ago

Finding Belief Geometries with Sparse Autoencoders

Using sparse autoencoders to understand geometric structure of belief representations in transformer models and LLMs.

Ax Yuheng Zhang, Mingyue Huo, Minghao Zhu, Mengxue Zhang, Nan Jiang 29d ago

Beyond Semantic Manipulation: Token-Space Attacks on Reward Models

Token-space adversarial attacks on reward models used in RLHF, introducing token mapping perturbation attack paradigm beyond semantic manipulation.

Ax Yuhui Lin, Siyue Yu, Yuxing Yang, Guangliang Cheng, Jimin Xiao 29d ago

Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs

Framework for reducing computational overhead in 3D multimodal LLMs through adaptive token reduction for resource-constrained deployment.

Ax Fanwei Zeng, Changtao Miao, Jing Huang, Zhiya Tan, Shutao Gong, Xiaoming Yu, Yang Wang, Weibin Yao, Joey Tianyi Zhou, Jianshu Li, Yin Yan 29d ago

DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning

AI agent system for document forgery detection using evidence-grounded reasoning, combining detection, localization, and explanation for document safety.

Ax Rodney Jehu-Appiah 29d ago

Trivial Vocabulary Bans Improve LLM Reasoning More Than Deep Linguistic Constraints

Controlled replication study examining vocabulary constraints versus linguistic structures in LLM reasoning, testing E-Prime effects on cognition.

Ax Yihong Dong, Xiaoha Jian, Xue Jiang, Xuyuan Guo, Zhiyuan Fan, Jiaru Qian, Kechi Zhang, Jia Li, Zhi Jin, Ge Li 29d ago

Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy

Systematic evaluation framework for LLM formal reasoning capabilities using Chomsky hierarchy and computational complexity theory.

Ax Junwei You, Pei Li, Zhuoyu Jiang, Weizhe Tang, Zilin Huang, Rui Gan, Jiaxi Liu, Yan Zhao, Sikai Chen, Bin Ran 29d ago

V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views

Multimodal LLM benchmark for autonomous driving with vehicle, infrastructure, and cooperative viewpoints, evaluating reasoning across V2X conditions.

Ax Mirali Purohit, Bimal Gajera, Irish Mehta, Bhanu Tokas, Jacob Adler, Steven Lu, Scott Dickenshied, Serina Diniega, Brian Bue, Umaa Rebbapragada, Hannah Kerner 29d ago

MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

Multi-sensor foundation model merging HiRISE, CTX, and THEMIS Mars remote sensing data via equal validation loss alignment strategy.

Ax Puyu Zeng, Zhaoxi Wang, Zhixu Duan, Liang Feng, Shaobo Wang, Cunxiang Wang, Jinghang Wang, Bing Zhao, Hu Wei, Linfeng Zhang 29d ago

IndustryCode: A Benchmark for Industry Code Generation

Multi-domain benchmark for industry code generation across finance, automation, and aerospace using LLMs, addressing single-domain limitations.

Ax Giyeong Oh, Junghyun Lee, Jaehyun Park, Youngjae Yu, Wonho Bae, Junhyug Noh 29d ago

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Evaluation of active preference learning versus random sampling in online DPO for modern LLMs, showing random sampling is surprisingly competitive.

Ax KrishnaSaiReddy Patil 29d ago

SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems

Formal framework for verifiable delegation chains in multi-agent AI systems, defining properties for authorization tracking and policy enforcement.

Ax Yongsu Ahn, Nam Wook Kim, Benjamin Bach 29d ago

Disrupting Cognitive Passivity: Rethinking AI-Assisted Data Literacy through Cognitive Alignment

Framework for improving data literacy in AI-assisted analysis by disrupting cognitive passivity through guided reasoning rather than direct answers.

Ax Shreshth Saini, Hakan Gedik, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik 29d ago

LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers

Diffusion transformer method for inverse tone-mapping, converting 8-bit SDR video content to perceptually accurate 10-bit HDR.

Ax Tianze Xu, Yanzhao Zheng, Pengrui Lu, Lyumanshan Ye, Yong Wu, Zhentao Zhang, Yuanqiang Yu, Chao Ma, Jihuai Zhu, Pengfei Liu, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu 29d ago

Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks

Rubric-based RL framework bridging response-level and token-level rewards for LLM alignment in instruction following tasks.

Ax Dexiang Li, Zhenning Che, Haijun Zhang, Dongliang Zhou, Zhao Zhang, Yahong Han 29d ago

PaveBench: A Versatile Benchmark for Pavement Distress Perception and Interactive Vision-Language Analysis

Benchmark dataset for pavement distress assessment using vision-language models, requiring quantitative analysis and interactive decision support.

Ax Lik Tung Fu, Jie Zhou, Shaokai Ren, Mengli Zhang, Jia Xiong, Hugo Jiang, Nan Guan, Xi Wang, Jun Yang 29d ago

ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs

Task-specific LLM framework for generating SystemVerilog assertions for hardware verification, addressing data scarcity and accuracy challenges.

Ax Xinhao Wang, Zhonyu Xia, Zhiwei Lin, Zhe Li, Yongtao Wang 29d ago

QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models

Quantization-aware vision token pruning for multimodal LLMs, optimizing coupled compression techniques for resource-constrained deployment.

Ax Hongbo Duan, Peiyu Zhuang, Yi Liu, Zhengyang Zhang, Yuxin Zhang, Pengting Luo, Fangming Liu, Xueqian Wang 29d ago

NavCrafter: Exploring 3D Scenes from a Single Image

Framework for synthesizing novel-view video sequences from single images using diffusion models with geometry-aware expansion strategy.

Ax Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui, Tianyue Luo 29d ago

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

First comprehensive security analysis of Agent Skills, an open standard for modular LLM agent packages, covering threat taxonomy and vulnerabilities.

Ax Niloofar Asefi, Tianning Wu, Ruoying He, Ashesh Chattopadhyay 29d ago

High-resolution probabilistic estimation of three-dimensional regional ocean dynamics from sparse surface observations

Conditional diffusion model for reconstructing 3D ocean states from sparse surface observations using satellite and in situ data.

Ax Allen He, Qi Liu, Kun Liu, Xinchen Liu, Wu Liu 29d ago

A Paradigm Shift: Fully End-to-End Training for Temporal Sentence Grounding in Videos

End-to-end training method for localizing temporal video segments matching sentence queries, addressing task discrepancy in video backbone optimization.

Ax Yixiang Fang, Arijit Khan, Tianxing Wu, Da Yan, Shu Wang 29d ago

LLM+Graph@VLDB'2025 Workshop Summary

Workshop on integrating LLMs with graph-structured data, covering algorithms and systems for bridging LLMs, graph databases, and ML for practical applications.

Ax Baban Gain, Asif Ekbal, Trilok Nath Singh 29d ago

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

Study of weight-space model merging for multilingual machine translation, evaluating behavior when combining independently fine-tuned models.

Ax Hai Nguyen-Truong, Alper Balbay, Tunga Bayrak 29d ago

Toward an Artificial General Teacher: Procedural Geometry Data Generation and Visual Grounding with Vision-Language Models

Procedural geometry data generation and visual grounding using vision-language models for geometry education as referring image segmentation.

Ax Gilad Abiri 29d ago