Isolater - Feed

HN roshanshaik 3/26/2026

LiteLLM Supply Chain Attack: Defense in Depth Is the Only AI Security Strategy

LiteLLM open-source LLM proxy suffered a supply chain attack in March 2026 where backdoored packages harvested credentials for three hours, demonstrating need for defense-in-depth security strategies.

HN CzaxTanmay 3/26/2026

Show HN: Spectator – A programming language for Cybersecurity and Hacking

Spectator: new scripting language for security work combining bash/python functionality with built-in security modules and GUI framework.

HN An0n_Jon 3/26/2026

Show HN: Orloj – agent infrastructure as code (YAML and GitOps)

Open-source orchestration runtime for multi-agent AI systems using declarative YAML manifests. GitOps approach to agent governance and workflows.

HN judekim 3/26/2026

Show HN: Scope – a beautiful open-source web client for Stremio

Open-source web client for Stremio streaming platform with syncing and stream selection features.

HN steadeepanda 3/26/2026

Show HN: Agent Ruler new update v0.1.9

Agent Ruler v0.1.9 update: reference monitor with confinement for AI agent workflows, adding security/safety layer outside agent guardrails.

HN chaudharydeepak 3/26/2026

Prompt Guard – MitM proxy that blocks credentials before they reach AI APIs

HTTPS MITM proxy intercepting prompts to AI APIs/assistants, detecting and blocking sensitive data before transmission to third-party servers.

HN obilgic 3/26/2026

Show HN: Agent Kernel – Three Markdown files that make any AI agent stateful

Three markdown files enabling stateless AI agents to maintain memory across sessions using git repos. Works with coding agents like Claude, Cursor, Windsurf.

HN MACCRE 3/26/2026

HN: Surviving the litellm supply chain attack with a pure ctypes OS Vault

Local-first AI orchestration framework (MACCREv2) designed to avoid trusting third-party wrappers with API keys/filesystem. Response to litellm supply chain attack.

HN nsagent 3/26/2026

Research Shows Verbatim Recall of Copyrighted Books in LLMs

Research study demonstrating verbatim recall of copyrighted books in finetuned LLMs across cross-author and within-author scenarios

Ax Burc Gokden 3/26/2026

PLDR-LLMs Reason At Self-Organized Criticality

Theoretical analysis of LLM reasoning properties at self-organized criticality with connections to phase transitions and scaling functions.

Ax Yenchia Feng, Chirag Sharma, Karime Maamari 3/26/2026

Environment Maps: Structured Environmental Representations for Long-Horizon Agents

Environment Maps: Persistent agent-agnostic representation for reducing cascading errors in long-horizon LLM-based software automation tasks.

Ax Zeinab Dehghani, Rameez Raja Kureshi, Koorosh Aslansefat, Faezeh Alsadat Abedi, Dhavalkumar Thakker, Lisa Greaves, Bhupesh Kumar Mishra, Baseer Ahmad, Tanaya Maslekar 3/26/2026

Evaluating a Multi-Agent Voice-Enabled Smart Speaker for Care Homes: A Safety-Focused Framework

Safety-focused evaluation framework for multi-agent voice-enabled smart speaker in care homes covering resident data access and task scheduling.

Ax Yi Han, Lingfei Qian, Yan Wang, Yueru He, Xueqing Peng, Dongji Feng, Yankai Chen, Haohang Li, Yupeng Cao, Jimin Huang, Xue Liu, Jian-Yun Nie, Sophia Ananiadou 3/26/2026

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

EnterpriseArena: Benchmark evaluating LLM agents as CFOs for resource allocation under uncertainty in dynamic business environments.

Ax Marc-Antoine Provost, Nejc Ilenic, Christopher Solinas, Philippe Beardsell 3/26/2026

GTO Wizard Benchmark

Public API and evaluation framework for benchmarking poker algorithms against GTO Wizard, a superhuman HUNL poker agent.

Ax Ashish Malik, Caleb Lowe, Aayam Shrestha, Stefan Lee, Fuxin Li, Alan Fern 3/26/2026

Grounding Vision and Language to 3D Masks for Long-Horizon Box Rearrangement

Method for long-horizon 3D box rearrangement using vision-language grounding and 3D masks for multi-step planning from natural language.

Ax Jerin George Mathew, Sumayya Taher, Anindita Kundu, Denilson Barbosa 3/26/2026

LLMs Do Not Grade Essays Like Humans

Evaluation comparing LLM essay scoring with human grading across GPT and Llama models, finding weak agreement in standard settings.

Ax Franck Ndzomga 3/26/2026

Efficient Benchmarking of AI Agents

Study on efficient benchmarking of AI agents showing how task subsets can preserve agent rankings while reducing evaluation costs.

Ax Han Zheng, Yining Ma, Brandon Araki, Jingkai Chen, Cathy Wu 3/26/2026

Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation

Learning-guided prioritized planning combining ML and search-based solvers for lifelong multi-agent pathfinding in warehouse automation.

Ax Yuhao Chen, Yi Xu, Xinyun Ding, Xiang Fang, Shuochen Liu, Luxi Lin, Qingyu Zhang, Ya Li, Quan Liu, Tong Xu 3/26/2026

VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents

VehicleMemBench: Benchmark for evaluating long-term memory in multi-user in-vehicle agents handling preference conflicts and temporal dynamics.

Ax Chung-En Johnny Yu, Brian Jalaian, Nathaniel D. Bastian 3/26/2026

SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems

SCoOP: Training-free uncertainty quantification framework for multi-VLM systems using semantic-consistent opinion pooling.

Ax Forest Agostinelli 3/26/2026

The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search

DeepXube: Free open-source Python package for pathfinding using learned heuristic functions from deep RL and search algorithms.

Ax Keru Hua, Ding Wang, Yaoying Gu, Xiaoguang Ma 3/26/2026

DUPLEX: Agentic Dual-System Planning via LLM-Driven Information Extraction

DUPLEX: Neuro-symbolic agentic architecture combining LLMs with schema-guided information extraction for robust robotic task planning in long-horizon domains.

Ax Zhixuan Bao, Zhuoyi Lin, Jiageng Wang, Jinhai Hu, Yuan Gao, Yaoxin Wu, Xiaoli Li, Xun Xu 3/26/2026

AnalogAgent: Self-Improving Analog Circuit Design Automation with LLM Agents

AnalogAgent: LLM-based agentic framework for automated analog circuit design using multi-model loops to preserve domain-specific insights and context.

Ax Lijing Luo, Yiben Luo, Alexey Gorbatovski, Sergey Kovalchuk, Xiaodan Liang 3/26/2026

From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments

Empirical study analyzing 2000+ RL papers to create quantitative taxonomy of reinforcement learning environments and technological trends.

Ax Xusen Guo, Mingxing Peng, Hongliang Lu, Hai Yang, Jun Ma, Yuxuan Liang 3/26/2026

Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing

MAPUS: LLM-based multi-agent framework for personalized and fair participatory urban sensing modeling participants as autonomous agents with preferences.

Ax Bingqing Wei, Zhongyu Xia, Dingai Liu, Xiaoyu Zhou, Zhiwei Lin, Yongtao Wang 3/26/2026

ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents

ELITE framework for self-improving embodied agents using vision-language models with experiential learning and intent-aware transfer to bridge vision-action gap.

Ax Florian Odi Stummer 3/26/2026

Enhanced Mycelium of Thought (EMoT): A Bio-Inspired Hierarchical Reasoning Architecture with Strategic Dormancy and Mnemonic Encoding

Enhanced Mycelium of Thought (EMoT): bio-inspired hierarchical reasoning architecture for LLMs with four-level hierarchy, strategic dormancy, and mnemonic encoding.

Ax Hadar Peer, Carlos Hernandez, Sven Koenig, Ariel Felner, Oren Salzman 3/26/2026

Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Standardized benchmarks and evaluation framework for multi-objective search addressing fragmentation in empirical evaluation.

Ax Yunbo Long 3/26/2026

AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model

AutoProf: multi-agent orchestration framework for autonomous AI research with persistent world model, gap analysis, and inter-agent verification mechanisms.

Ax John Ray B. Martinez 3/26/2026

Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA

Multi-agent framework with specialist agents for medical multiple-choice question answering, improving calibration and confidence scoring through verification.

Ax Shalender Singh, Vishnu Priya Singh Parmar 3/26/2026

From Liar Paradox to Incongruent Sets: A Normal Form for Self-Reference

Incongruent normal form structural representation for self-referential semantic sentences preserving classical semantics.

Ax Biplab Pal, Santanu Bhattacharya 3/26/2026

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Markovian framework for auditing reliability and oversight costs in agentic AI systems operating as stochastic policies with sequential decisions and tool calls.

Ax Christopher M. Ackerman, Nina Panickssery 3/26/2026

Mitigating Many-Shot Jailbreaking

Analysis of many-shot jailbreaking technique exploiting long context windows; probes effectiveness and develops mitigation strategies for LLM safety.

Ax Christopher Ackerman 3/26/2026

Evidence for Limited Metacognition in LLMs

Novel methodology quantitatively evaluating metacognitive abilities in LLMs, testing self-awareness without relying on model self-reports.

Ax Tianpeng Zheng, Zhehan Jiang, Jiayi Liu, Shicong Feng 3/26/2026

Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking

Computerized Adaptive Testing framework grounded in Item Response Theory for cost-effective and scalable evaluation of LLMs in medical benchmarking.

Ax Fangyu Ding, Ding Ding, Sijin Chen, Kaibo Wang, Peng Xu, Zijin Feng, Haoli Bai, Kai Han, Youliang Yan, Binhang Yuan, Jiacheng Sun 3/26/2026

Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes

Deletion-Insertion Diffusion language models replacing masking paradigm with discrete diffusion processes for improved computational efficiency and generation flexibility.

Ax Yutao Wu, Xiao Liu, Yifeng Gao, Xiang Zheng, Hanxun Huang, Yige Li, Cong Wang, Bo Li, Xingjun Ma, Yu-Gang Jiang 3/26/2026

Internal Safety Collapse in Frontier Large Language Models

Internal Safety Collapse (ISC) failure mode identified in frontier LLMs where models generate harmful content under certain task conditions; TVD framework presented to trigger and study ISC.

Ax Jonathan Prunty, Seraphina Zhang, Patrick Quinn, Jianxun Lian, Xing Xie, Lucy Cheke 3/26/2026

Visuospatial Perspective Taking in Multimodal Language Models

Evaluation of visuospatial perspective-taking abilities in multimodal language models using adapted tasks from human studies (Director Task, Rotating F task).

Ax Kenza Benkirane, Dan Goldwater, Martin Asenov, Aneiss Ghodsi 3/26/2026

DISCO: Document Intelligence Suite for COmparative Evaluation

DISCO benchmark suite for evaluating OCR pipelines and vision-language models on document parsing and QA across diverse document types including handwritten and multilingual text.

Ax Rong Fu, Yemin Wang, Tianxiang Xu, Yongtai Liu, Weizhi Tang, Wangyu Wu, Xiaowen Ma, Simon Fong 3/26/2026

S-Path-RAG: Semantic-Aware Shortest-Path Retrieval Augmented Generation for Multi-Hop Knowledge Graph Question Answering

S-Path-RAG framework for multi-hop question answering over knowledge graphs using semantic-aware shortest-path retrieval with differentiable path scoring.

Ax Samridhi Vaid, Mike Weldon, Jesse Dunn, Sacha Davis, Kevin Lonergan, Henry Li, Jeffrey Franc, Mohamed Abdalla, Daniel C. Baumgart, Jake Hayward, J Ross Mitchell 3/26/2026

Berta: an open-source, modular tool for AI-enabled clinical documentation

Berta: open-source modular platform for AI-enabled clinical documentation with institutional data governance and workflow integration, deployed at Alberta Health Services.

Ax Alexander Sheppert 3/26/2026

DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models

DepthCharge framework for measuring how deeply LLMs sustain accurate responses in domain-specific topics through adaptive probing across arbitrary domains.

Ax John Cook, Michael Wyatt, Peng Wei, Iris Chin, Santosh Gupta, Van Zyl Van Vuuren, Richie Siburian, Amanda Spicer, Kristen Viviano, Alda Cami, Raunaq Malhotra, Zhewei Yao, Jeff Rasley, Gaurav Kaushik 3/26/2026

Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data

Privacy-preserving synthetic clinical data trains LLM for medical coding automation, improving ICD-10-CM and CPT code assignment from clinical documentation.

Ax Yu Chen, Runkai Chen, Sheng Yi, Xinda Zhao, Xiaohong Li, Jianjin Zhang, Jun Sun, Chuanrui Hu, Yunyun Han, Lidong Bing, Yafeng Deng, Tianqiao Chen 3/26/2026

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Memory Sparse Attention enables end-to-end LLM scaling to 100M tokens for long-term memory tasks, extending effective context beyond 1M token limits.

Ax Reza Habibi, Darian Lee, Magy Seif El-Nasr 3/26/2026

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

Position paper proposes mechanism-aware evaluation combining symbolic rules and mechanistic interpretability to distinguish genuine generalization from shortcuts.

Ax Peijun Qing, Puneet Mathur, Nedim Lipka, Varun Manjunatha, Ryan Rossi, Franck Dernoncourt, Saeed Hassanpour, Soroush Vosoughi 3/26/2026

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

Cluster-R1 reframes instruction-following clustering as generative task, enabling reasoning models to autonomously infer corpus structure while respecting user instructions.

Ax Lin Yang, Yuancheng Yang, Xu Wang, Changkun Liu, Haihua Yang 3/26/2026

MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios?

MedMT-Bench stress-tests LLMs on long-context memory, interference robustness, and safety in multi-turn medical conversations with realistic clinical scenarios.

Ax Chanyong Luo, Jirui Dai, Zhendong Wang, Kui Chen, Jiaxi Yang, Bingjie Lu, Jing Wang, Jiaxin Hao, Bing Li, Ruiyang He, Yiyu Qiao, Chenkai Zhang, Kaiyu Wang, Zhi Liu, Zeyu Zheng, Yan Li, Xiaohong Gu 3/26/2026

From Physician Expertise to Clinical Agents: Preserving, Standardizing, and Scaling Physicians' Medical Expertise with Lightweight LLM

Lightweight LLM framework captures and scales physician expertise for clinical decision-making agents using individualized diagnostic methodologies.

Ax Shaharukh Khan, Ali Faraz, Abhinav Ravi, Mohd Nauman, Mohd Sarfraz, Akshat Patidar, Raja Kolla, Chandra Khatri, Shubham Agarwal 3/26/2026

Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages

Chitrakshara multimodal dataset provides multi-image and Indian language coverage for training Vision-Language Models beyond English-centric datasets.

Ax Shanghua Gao, Yuchang Su, Pengwei Sui, Curtis Ginder, Marinka Zitnik 3/26/2026

Qworld: Question-Specific Evaluation Criteria for LLMs

Qworld framework generates question-specific evaluation criteria for LLMs on open-ended tasks, capturing context-dependent response quality requirements.