Isolater - Feed

HN rantingdemon 25d ago

Anthropic's latest AI model could let hackers carry out attacks faster

Anthropic provides Mythos model to major tech companies for cybersecurity testing and vulnerability discovery.

HN Davincios 25d ago

Open-source weekly live coding session (Thur 5PM)

Open-source framework for AI SRE agents that integrate 40+ infrastructure tools to autonomously investigate and resolve production incidents.

HN stealthybox 25d ago

Why GitOps still matters in a world of AI agents (FluxCon) [video]

Video presentation on GitOps relevance and practices in systems managed by AI agents, from FluxCon conference.

HN qingant 25d ago

Yu – Sandboxes your Claude Code/Codex with zero credential exposure

Yu is a sandboxing tool that isolates Claude Code and Codex execution to prevent credential exposure from compromised code or dependencies.

HN sorenbs 25d ago

GLM 5.1: Pelican Test

GLM-5.1 is a 754B parameter open-source LLM that demonstrates improved reasoning and multi-modal capabilities like unprompted SVG+CSS generation.

HN saikatsg 25d ago

Your parallel Agent limit

Analysis of cognitive load and limitations when managing multiple parallel AI agents, focusing on human-in-the-loop costs beyond throughput metrics.

HN beardyw 25d ago

Row over 'virtual gated community' AI surveillance plan in Toronto neighbourhood

News article on Toronto neighborhood's debate over AI-powered license plate scanning surveillance system to combat property crime.

HN motakuk 25d ago

Enterprise-Managed Authorization for MCP

Enterprise authorization system for Model Context Protocol (MCP) servers using centralized identity providers. Addresses deployment challenges in large organizations.

HN rs545837 25d ago

Structural and semantic component for improving code reviews with local models

Research on improving code reviews by adding semantic analysis layer to local LLMs, providing contextual function/type information beyond diffs.

HN hunglee2 25d ago

China's AI Ethics Governance

Newsletter promotion about AI ethics governance in China. Mostly self-promotional content with no technical depth or original research.

HN jnord 25d ago

Google launched an AI dictation app that works offline

Google releases offline-first dictation app using Gemma-based ASR models. Open-source LLM application for speech recognition on consumer hardware.

HN gpi 25d ago

Two Years of Valkey

Retrospective on Valkey, the open-source Redis fork created two years ago after Redis license change to source-available model.

HN nta25297 25d ago

Optinum – finds the blind spots AI coding agents systematically miss in PR tests

Tool that detects blind spots in AI coding agent pull request reviews by analyzing API and database boundary changes. Addresses integration testing gaps.

HN elliptic1 25d ago

Show HN: Voiceplan.it – The New Planning Mode

Voice-first AI planning tool with MCP integration. Conversational AI agent for strategic planning that generates structured documents in real time.

HN psychip 25d ago

GPU-resident vector database under 300kb

GPU-resident vector database (~300KB executable) supporting 12M vectors with ~10ms query latency, TCP interface, no dependencies.

HN moodiverse 25d ago

Show HN: I'm trying to get 100 users in 24 hours with this simple idea

Buildfeed is a simple social platform for sharing projects without launch pressure, targeting 100 users in 24 hours.

HN MrBuddyCasino 25d ago

"I started to lose my ability to code"

Incomplete article about losing coding ability. Truncated content without substantive information. Likely newsletter signup page.

HN perch56 25d ago

EU's Exposed AI Infrastructure

Security analysis: 25,000+ publicly exposed Ollama instances found in April 2026, 22x increase from September 2025, raising infrastructure security concerns.

BL 25d ago

Introducing the Child Safety Blueprint

OpenAI announces Child Safety Blueprint framework for combating AI-enabled child sexual exploitation, developed with NCMEC and law enforcement partners.

HN LexSiga 25d ago

Ducklake Demo

Open-source lakehouse demo using DuckDB, dlt, and dbt. Complete runnable example of ELT pipeline with parquet files and analytics transformation.

HN niseus 25d ago

Show HN: KOS Protocol – A kos.json file for AI agents to read verified facts

KOS Protocol: open standard for publishing machine-readable verified facts with provenance tracking and freshness decay. Addresses AI hallucination via structured data.

HN usestork 25d ago

Show HN: The Spotify for AI Agents – StarSinger MCP

AI agents platform positioning itself as Spotify for agents using Model Context Protocol (MCP).

Ax Sharath Sathish 25d ago

Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya

Research on fine-tuning LLMs for epistemic reasoning using Navya-Nyaya logic. Addresses hallucination and brittleness in LLM reasoning capabilities.

Ax Enso O. Torres Alegre, Diana E. Mora Jimenez 25d ago

Operational Noncommutativity in Sequential Metacognitive Judgments

Theoretical framework exploring order effects in sequential cognitive processes and non-commutativity in metacognition using operational methods.

Ax Volodymyr Yuzefovych 25d ago

Proximity Measure of Information Object Features for Solving the Problem of Their Identification in Information Systems

Proximity measure quantifies similarity of multi-source information object features for entity identification and matching across heterogeneous data sources.

Ax Cuong Van Duc, Minh Nguyen Dinh Tuan, Tam Vu Duc, Tung Vu Duy, Son Nguyen Van, Hanh Nguyen Thi, Binh Huynh Thi Thanh 25d ago

ReVEL: Multi-Turn Reflective LLM-Guided Heuristic Evolution via Structured Performance Feedback

ReVEL hybrid framework uses LLM-guided iterative evolution with structured performance feedback to design effective heuristics for NP-hard problems.

Ax Min Sun (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development), Federica Storti (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development), Valentina Martino (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development), Miguel Gonzalez-Andrades (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development), Tony Kam-Thong (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development) 25d ago

Algebraic Structure Discovery for Real World Combinatorial Optimisation Problems: A General Framework from Abstract Algebra to Quotient Space Learning

Framework identifies algebraic structures in combinatorial optimization problems, constructs quotient spaces to reduce search space and improve solution quality.

Ax Yiwen Song, Yale Song, Tomas Pfister, Jinsung Yoon 25d ago

PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing

PaperOrchestra multi-agent framework automates AI research paper writing by transforming unstructured materials into submission-ready LaTeX manuscripts.

Ax Frazier N. Baker, Trieu Nguyen, Reza Averly, Botao Yu, Daniel Adu-Ampratwum, Huan Sun, Xia Ning 25d ago

MMORF: A Multi-agent Framework for Designing Multi-objective Retrosynthesis Planning Systems

MMORF multi-agent framework uses language models with specialized agents for multi-objective retrosynthesis planning balancing quality, safety, and cost.

Ax Andrew Sellergren, Chufan Gao, Fereshteh Mahvar, Timo Kohlberger, Fayaz Jamil, Madeleine Traverse, Alberto Tono, Bashir Sadjad, Lin Yang, Charles Lau, Liron Yatziv, Tiffany Chen, Bram Sterling, Kenneth Philbrick, Richa Tiwari, Yun Liu, Madhuram Jajoo, Chandrashekar Sankarapu, Swapnil Vispute, Harshad Purandare, Abhishek Bijay Mishra, Sam Schmidgall, Tao Tu, Anil Palepu, Chunjong Park, Tim Strother, Rahul Thapa, Yong Cheng, Preeti Singh, Kat Black, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Joelle Barral, Tris Warkentin, Shravya Shetty, Dale Webster, Sunny Virmani, David F. Steiner, Can Kirmizibayrak, Daniel Golden 25d ago

MedGemma 1.5 Technical Report

MedGemma 1.5 4B model expands medical capabilities with high-dimensional imaging (CT/MRI/histopathology), anatomical localization, and improved document understanding.

Ax Xuyang Shen, Haoran Liu, Dongjin Song, Martin Renqiang Min 25d ago

Uncertainty-Guided Latent Diagnostic Trajectory Learning for Sequential Clinical Diagnosis

LLM-based sequential clinical diagnosis system models uncertainty-guided evidence acquisition over time using diagnostic trajectory learning.

Ax Jose L. Salmeron 25d ago

Non-monotonic causal discovery with Kolmogorov-Arnold Fuzzy Cognitive Maps

Kolmogorov-Arnold Fuzzy Cognitive Maps extend neuro-symbolic modeling to handle non-monotonic causal dependencies in complex dynamic systems.

Ax Rongqian Chen, Yu Li, Zeyu Fang, Sizhe Tang, Weidong Cao, Tian Lan 25d ago

IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents

IntentScore is a plan-aware reward model trained on 398K offline GUI interactions to evaluate and score actions for computer-use agents across multiple operating systems.

Ax Hieu Le, Oguz Bedir, Mostafa Ibrahim, Jian Tao, Sabit Ekin 25d ago

Bypassing the CSI Bottleneck: MARL-Driven Spatial Control for Reflector Arrays

Multi-agent reinforcement learning replaces channel modeling with spatial intelligence for autonomous control of reconfigurable intelligent surface arrays.

Ax Hieu Le, Mostafa Ibrahim, Oguz Bedir, Jian Tao, Sabit Ekin 25d ago

Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors

Hierarchical multi-agent reinforcement learning optimizes reconfigurable intelligent surfaces for mmWave networks without channel state information estimation.

Ax Ahmad Maroof Karimi, Jong Youl Choi, Charles Qing Cao, Awais Khan 25d ago

Instruction-Tuned LLMs for Parsing and Mining Unstructured Logs on Leadership HPC Systems

Instruction-tuned LLMs parse and mine unstructured HPC system logs from heterogeneous sources to extract patterns and diagnose operational issues.

Ax Xiangyi Li, Kyoung Whan Choe, Yimin Liu, Xiaokun Chen, Chujun Tao, Bingran You, Wenbo Chen, Zonglin Di, Jiankai Sun, Shenghan Zheng, Jiajun Bao, Yuanli Wang, Weixiang Yan, Yiyuan Li, Han-chung Lee 25d ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

ClawsBench benchmark evaluates LLM agents on realistic productivity tasks (email, scheduling, documents) in simulated multi-service environments with stateful workflows.

Ax Eliza Berman, Bella Chang, Daniel B. Neill, Emily Black 25d ago

Attribution Bias in Large Language Models

AttriBench: Demographically-balanced benchmark for measuring attribution bias in LLMs when attributing quotes to original authors.

Ax Christopher Koch 25d ago

From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI

Framework for translating governance norms into enforceable runtime guardrails for agentic AI systems with multi-step execution.

Ax Zhiming Xue, Menghao Huo, Yujue Wang 25d ago

EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics Networks

Graph neural network approach for predicting delivery delays in logistics networks using warehouse and transportation data.

Ax Jonathan Elsworth Eicher 25d ago

Simulating the Evolution of Alignment and Values in Machine Intelligence

Evolutionary theory simulation of how alignment affects populations of AI models over time and belief propagation dynamics.

Ax Muhammad Ahmed Mohsin, Ahsan Bilal, Muhammad Umer, Emily Fox 25d ago

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

Reward decomposition approach to disentangle pressure capitulation from evidence blindness in LLM sycophancy behavior.

Ax Lesong Tao, Yifei Wang, Haodong Jing, Jingwen Fu, Miao Kang, Shitao Chen, Nanning Zheng 25d ago

Breakthrough the Suboptimal Stable Point in Value-Factorization-Based Multi-Agent Reinforcement Learning

Theoretical analysis and solutions for value factorization convergence to suboptimal stable points in multi-agent reinforcement learning.

Ax Dawei Li, Zongxia Li, Hongyang Du, Xiyang Wu, Shihang Gui, Yongbei Kuang, Lichao Sun 25d ago

Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

Graph of Skills: Dependency-aware skill retrieval system for managing and scaling thousands of reusable skills in agent systems.

Ax Hangoo Kang, Tarun Suresh, Jon Saad-Falcon, Azalia Mirhoseini 25d ago

TRACE: Capability-Targeted Agentic Training

TRACE: Framework for targeted training of LLM agents on capability gaps identified in specific environments and task distributions.

Ax Aisvarya Adeseye, Jouni Isoaho, Seppo Virtanen, Mohammad Tahir 25d ago

Dynamic Agentic AI Expert Profiler System Architecture for Multidomain Intelligence Modeling

Agentic AI system that profiles user expertise levels to adapt interaction depth using LLaMA-based modular architecture.

Ax Zhe Yu, Wenpeng Xing, Meng Han 25d ago

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs

RETINA-SAFE benchmark and ECRT framework for detecting hallucination risks in medical LLMs with insufficient or conflicting evidence.

Ax Xuan Xiong, Huan Liu, Li Gu, Zhixiang Chi, Yue Qiu, Yuanhao Yu, Yang Wang 25d ago

ETR: Entropy Trend Reward for Efficient Chain-of-Thought Reasoning

ETR: Training method for efficient chain-of-thought reasoning by optimizing entropy trends rather than global uncertainty reduction.

Ax Zhe Yu, Wenpeng Xing, Meng Han 25d ago

LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment

LatentAudit: White-box monitoring system for RAG hallucination detection using Mahalanobis distance on residual stream activations.

Ax Md Atik Ahamed, Mihir Parmar, Palash Goyal, Yiwen Song, Long T. Le, Qiang Cheng, Chun-Liang Li, Hamid Palangi, Jinsung Yoon, Tomas Pfister 25d ago

TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

TFRBench: Benchmark for evaluating reasoning capabilities of time-series forecasting systems beyond numerical accuracy metrics.