Isolater - Feed

Ax Vatsal Gupta, Darshan Sreenivasamurthy 3/18/2026

Prose2Policy (P2P): A Practical LLM Pipeline for Translating Natural-Language Access Policies into Executable Rego

Prose2Policy: LLM pipeline translating natural-language access control policies into executable Rego code. End-to-end pipeline with test generation and validation.

Ax Sankalp Dubedy 3/18/2026

Persona-Conditioned Risk Behavior in Large Language Models: A Simulated Gambling Study with GPT-4.1

Empirical study of GPT-4.1 behavior in gambling tasks under different persona prompts. Examines whether LLM risk behavior reflects principled patterns or prompt mimicry.

Ax Pranaya Jajoo, Harshit Sikchi, Siddhant Agarwal, Amy Zhang, Scott Niekum, Martha White 3/18/2026

Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

Regularized latent dynamics prediction as baseline for behavioral foundation models, examining how state feature choice affects task adaptability and reward function expressivity.

Ax Puneet Sharma, Christer Henrik Pursiainen 3/18/2026

Resilience Meets Autonomy: Governing Embodied AI in Critical Infrastructure

Framework for governing embodied AI in critical infrastructure through hybrid oversight modes and bounded autonomy, addressing resilience beyond statistically representable uncertainty.

Ax Andrea Tupini, Lars Liden, Reuben Tan, Yu Wang, Jianfeng Gao 3/18/2026

AsgardBench - Evaluating Visually Grounded Interactive Planning Under Minimal Feedback

AsgardBench evaluates visually grounded interactive planning for embodied AI agents, focusing on high-level action sequence generation with plan adaptation based on visual feedback.

Ax Lara Lee Russell-Lasalandra, Hudson Golino 3/18/2026

Prompt Engineering for Scale Development in Generative Psychometrics

Monte Carlo simulation evaluating prompt engineering strategies for LLM-generated personality assessment items across zero-shot, few-shot, and persona-based designs.

Ax Vasily Ilin 3/18/2026

Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau Equilibrium

Lean 4 formalization of Vlasov-Maxwell-Landau equilibrium using AI reasoning (Gemini DeepThink) and agentic tools (Claude Code) demonstrating AI-assisted mathematical research workflows.

Ax Stylianos Loukas Vasileiou, Antonio Rago, Francesca Toni, William Yeoh 3/18/2026

Argumentative Human-AI Decision-Making: Toward AI Agents That Reason With Us, Not For Us

Framework combining computational argumentation with LLMs to create transparent, verifiable AI agents that reason collaboratively with humans rather than providing opaque recommendations.

Ax Jacopo Teneggi, S. M. Bargeen A. Turzo, Tanya Marwah, Alberto Bietti, P. Douglas Renfrew, Vikram Khipple Mulligan, Siavash Golkar 3/18/2026

Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents

Agent Rosetta uses LLMs as specialized scientific agents for protein design tasks, emulating reasoning and tool use for broad design pipelines beyond canonical amino acids.

Ax Rushil Thareja, Gautam Gupta, Francesco Pinto, Nils Lukas 3/18/2026

MAC: Multi-Agent Constitution Learning

MAC automatically learns constitutional AI rules from training data using multi-agent approaches, improving upon existing LLM-based prompt optimizers through structured learning.

Ax Cosimo Spera 3/18/2026

Safety is Non-Compositional: A Formal Framework for Capability-Based AI Systems

Formal proof that safety is non-compositional: two individually incapable agents can collectively reach forbidden goals through emergent conjunctive capability dependencies.

Ax Hong Zhang, Barry Smith, Satish Balay, Le Chen, Murat Keceli, Lois Curfman McInnes, Junchao Zhang 3/18/2026

An Agentic Evaluation Framework for AI-Generated Scientific Code in PETSc

petscagent-bench evaluates AI-generated scientific code for HPC libraries beyond test-case matching, assessing solver selection, API conventions, memory management, and performance.

Ax Oliver Zahn, Simran Chana 3/18/2026

Selective Memory for Artificial Intelligence: Write-Time Gating with Hierarchical Archiving

Write-time gating mechanism filters incoming knowledge objects based on salience scores to improve retrieval-augmented generation accuracy and mirror biological memory archiving.

Ax Veronique Ziegler 3/18/2026

IRAM-Omega-Q: A Computational Architecture for Uncertainty Regulation in Artificial Agents

IRAM-Omega-Q computational architecture uses quantum-like density matrices to model internal regulation and uncertainty management in artificial agents under stochastic perturbation.

Ax Jake Van Clief, David McDermott 3/18/2026

Interpretable Context Methodology: Folder Structure as Agentic Architecture

Model Workspace Protocol (MWP) simplifies agentic AI orchestration using folder structures for sequential workflows, reducing engineering overhead compared to multi-agent frameworks.

Ax Dongik Shin 3/18/2026

Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation

Enhances OpenVLA vision-language-action models with synthetic instruction augmentation to improve zero-shot performance in new environments for embodied AI tasks.

Ax Jungwoo Shim, Dae Won Kim, Sun Wook Kim, Soo Young Kim, Myungcheol Lee, Jae-geun Cha, Hyunhwa Choi 3/18/2026

POaaS: Minimal-Edit Prompt Optimization as a Service to Lift Accuracy and Cut Hallucinations on On-Device sLLMs

POaaS optimizes prompts for on-device small language models through minimal edits, reducing hallucinations and improving accuracy without requiring lengthy structured instructions.

Ax Ding Wei 3/18/2026

A Context Alignment Pre-processor for Enhancing the Coherence of Human-LLM Dialog

Context alignment pre-processor enhances LLM dialogue coherence by resolving contextual misalignment when users omit premises, simplify references, or shift context during interactions.

Ax Yu Li, Rui Miao, Zhengling Qi, Tian Lan 3/18/2026

ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning

ARISE uses hierarchical reinforcement learning to improve mathematical reasoning in LLMs by developing reusable strategies that accumulate during training rather than treating problems in isolation.

Ax Sarthak Ahuja, Neda Kordjazi, Evren Yortucboylu, Vishaal Kapoor, Mariam Dundua, Yiming Li, Derek Ho, Vaibhavi Padala, Jennifer Whitted, Rebecca Steinert 3/18/2026

VIGIL: Towards Edge-Extended Agentic AI for Enterprise IT Support

VIGIL deploys edge-resident AI agents for enterprise IT support, performing diagnosis, knowledge retrieval, and policy-governed remediation on user devices with consent and observability.

Ax Zhengzheng Tang 3/18/2026

NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics

NeuronSpark: 0.9B-parameter spiking neural network language model using state-space dynamics and surrogate gradients without Transformer distillation.

Ax Long Li, Zhijian Zhou, Jiangxuan Long, Peiyang Liu, Weidi Xu, Zhe Wang, Shirui Pan, Chao Qu 3/18/2026

SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation

SQL-ASTRA: agentic reinforcement learning framework for text-to-SQL using column-set matching and trajectory aggregation for credit assignment.

Ax Eshwar Reddy M, Sourav Karmakar 3/18/2026

Are Large Language Models Truly Smarter Than Humans?

Data contamination audit reveals public LLM benchmarks may be leaked in training data; questions claims of superhuman performance.

Ax Xinxin Jin, Zhengwei Ni, Zhengguo Sheng, Victor C. M. Leung 3/18/2026

Proactive Rejection and Grounded Execution: A Dual-Stage Intent Analysis Paradigm for Safe and Efficient AIoT Smart Homes

Framework for safe LLM-based IoT agents using dual-stage intent analysis to prevent hallucination and reduce interaction overhead.

Ax Jingyu Peng, Hongyu Chen, Jiancheng Dong, Maolin Wang, Wenxi Li, Yuchen Li, Kai Zhang, Xiangyu Zhao 3/18/2026

MOSAIC: Composable Safety Alignment with Modular Control Tokens

MOSAIC: modular control token approach for context-dependent safety alignment in LLMs across applications and regions.

Ax Chunjiang Mu, Ya Zeng, Qiaosheng Zhang, Kun Shao, Chen Chu, Hao Guo, Danyang Jia, Zhen Wang, Shuyue Hu 3/18/2026

Adaptive Theory of Mind for LLM-based Multi-Agent Coordination

Adaptive theory of mind framework for LLM-based multi-agent coordination, aligning agents' reasoning depth about others' mental states.

Ax Ming Yang, Zhi Zhou, Shi-Yu Tian, Kun-Yang Yu, Lan-Zhe Guo, Yu-Feng Li 3/18/2026

NeSy-Route: A Neuro-Symbolic Benchmark for Constrained Route Planning in Remote Sensing

NeSy-Route neuro-symbolic benchmark for constrained route planning in remote sensing, evaluating perception, reasoning, and planning of MLLMs.

Ax Hugo Math 3/18/2026

Learning to Predict, Discover, and Reason in High-Dimensional Discrete Event Sequences

Learns to predict and reason over high-dimensional discrete event sequences from vehicle diagnostic trouble codes using machine learning.

Ax Qinhong Lin, Ruitao Feng, Yinglun Feng, Zhenxin Huang, Yukun Chen, Zhongliang Yang, Linna Zhou, Binjie Fei, Jiaqi Liu, Yu Li 3/18/2026

FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment

FactorEngine framework for automated discovery of interpretable alpha factors from market data, combining symbolic and neural approaches for quantitative investment.

Ax Quan Cheng 3/18/2026

Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences

Empirical analysis showing negative-only feedback training for LLMs matches or exceeds standard RLHF, exploring theoretical foundations via via negativa framework.

Ax Haochen Luo, Zhengzhao Lai, Junjie Xu, Yifan Li, Tang Pok Hin, Yuan Zhang, Chen Liu 3/18/2026

From Natural Language to Executable Option Strategies via Large Language Models

Introduces Option Query Language (OQL) domain-specific intermediate representation for translating natural language into executable financial option strategies.

Ax Xinyi Yang, Chenheng Xu, Weijun Hong, Ce Mo, Qian Wang, Fang Fang, Yixin Zhu 3/18/2026

Visual Distraction Undermines Moral Reasoning in Vision-Language Models

Studies how visual distractions undermine moral reasoning in vision-language models, identifying gaps in multimodal safety techniques.

Ax Ai Jian, Xiaoyun Zhang, Wanrou Du, Jingqing Ruan, Jiangbo Pei, Weipeng Zhang, Ke Zeng, Xunliang Cai 3/18/2026

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

TRUST-SQL uses reinforcement learning for text-to-SQL over unknown database schemas, where agents actively identify relevant tables from massive metadata.

Ax Linghua Zhang, Jun Wang, Jingtong Wu, Zhisong Zhang 3/18/2026

RetailBench: Evaluating Long-Horizon Autonomous Decision-Making and Strategy Stability of LLM Agents in Realistic Retail Environments

RetailBench evaluates long-horizon autonomous decision-making of LLM agents in realistic dynamic retail environments with stochastic conditions.

Ax Yu Liu, Lei Zhang, Haoxun Li, Hanlei Shi, Yuxuan Ding, Leyuan Qu, Taihao Li 3/18/2026

Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition

Hybrid-evidential deductive reasoning approach for open-vocabulary multimodal emotion recognition using MLLMs.

Ax Oleg Somov, Mikhail Chaichuk, Mikhail Seleznyov, Alexander Panchenko, Elena Tutubalina 3/18/2026

Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

Causal evaluation protocol measuring whether intermediate structures (rubrics, checklists) causally determine LLM outputs or merely accompany them.

Ax Zihe Wang, Yihuan Wang, Haiyang Yu. Zhiyong Cui, Xiaojian Liao, Chengcheng Wang, Yonglin Tian, Yongxin Tong 3/18/2026

ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation

Multimodal LLM (ExpressMind) for expressway operation, applying cognitive intelligence to transportation systems beyond rule-based approaches.

Ax Lu\'is Freire, Fernanda A. Andal\'o, Nicki Skafte Detlefsen 3/18/2026

Exploring different approaches to customize language models for domain-specific text-to-code generation

Investigates customization approaches for smaller open-source LLMs to improve domain-specific code generation without relying on large proprietary models.

Ax Carmen Ng 3/18/2026

Designing for Disagreement: Front-End Guardrails for Assistance Allocation in LLM-Enabled Robots

Proposes guardrails for LLM-enabled robots allocating scarce assistance across multiple users with conflicting values and unpredictable LLM behavior.

Ax Sangyeon Yoon, Sunkyoung Kim, Hyesoo Hong, Wonje Jeung, Yongil Kim, Wooseok Seo, Heuiyeen Yeen, Albert No 3/18/2026

BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs

BenchPreS evaluates whether memory-based LLM personalization appropriately suppresses user preferences in context-sensitive communication settings.

Ax Seyed Mahed Mousavi, Christian Moiola, Massimo Rizzoli, Simone Alghisi, Giuseppe Riccardi 3/18/2026

V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge in Vision Language Models

V-DyKnow benchmark evaluates how vision-language models handle time-sensitive knowledge that becomes outdated after training.

Ax Maurits Kaptein, Vassilis-Javed Khan, Andriy Podstavnychy 3/18/2026

Runtime Governance for AI Agents: Policies on Paths

Framework for runtime governance of LLM-based AI agents, balancing task completion with legal and reputational costs through execution-path monitoring.

Ax Ming Li, Xirui Li, Tianyi Zhou 3/18/2026

When AI Navigates the Fog of War

Analyzes AI reasoning about geopolitical conflicts using temporally grounded case study of 2026 Middle East conflict after model training cutoffs.

Ax Imko Marijnissen, J. Christopher Beck, Emir Demirovi\'c, Ryo Kuroiwa 3/18/2026

Domain-Independent Dynamic Programming with Constraint Propagation

Integrates constraint propagation into dynamic programming to bridge gap between state-based and constraint-based paradigms for combinatorial problems.

Ax Beno\^it Alcaraz 3/18/2026

What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline

Pipeline for developing norm-compliant reinforcement learning agents inspired by Pinocchio story, addressing safe AI integration into society.

Ax Ziqin Gong, Ning Li, Huaikang Zhou 3/18/2026

Machines acquire scientific taste from institutional traces

Fine-tuning LLMs on journal publication decisions to enable models to assess scientific merit and predict promising research directions.

Ax Firoj Alam, Fatema Ahmad, Ali Ezzat Shahroor, Mohamed Bayan Kmainasi, Elisa Sartori, Giovanni Da San Martino, Abul Hasnat, Raian Ali 3/18/2026

CritiSense: Critical Digital Literacy and Resilience Against Misinformation

Mobile app teaching digital literacy and prebunking misinformation tactics through interactive challenges in nine languages.

Ax Jian Yang, Wei Zhang, Shawn Guo, Zhengmao Ye, Lin Jing, Shark Liu, Yizhi Li, Jiajun Wu, Cening Liu, X. Ma, Yuyang Song, Siwei Wu, Yuwen Li, L. Liao, T. Zheng, Ziling Huang, Zelong Huang, Che Liu, Yan Xing, Renyuan Li, Qingsong Cai, Hanxu Yan, Siyue Wang, Shikai Li, Jason Klein Liu, An Huang, Yongsheng Kang, Jinxing Zhang, Chuan Hao, Haowen Wang, Weicheng Gu, Ran Tao, Mingjie Tang, Peihao Wu, Jianzhou Wang, Xianglong Liu, Weifeng Lv, Bryan Dai 3/18/2026