Isolater - Feed

HN justiceforsaas 7/1/2026

GLM-5.2's Code Reviews Are Only as Good as Your Prompt

Analysis of GLM-5.2 open-weight model's inconsistent code review performance and prompt sensitivity.

HN testofschool 7/1/2026

Why averaging LLM benchmark scores is fundamentally broken

Research paper analyzing fundamental problems with averaging LLM benchmark scores for model evaluation.

HN dutchcode 7/1/2026

Show HN: Bol.ai – Extract structured data from Bills of Lading

LLM-powered document extraction tool converting Bills of Lading and logistics documents to structured data.

HN sarreph 7/1/2026

Show HN: Pokayoke – turn repo conventions into deterministic checks for agents

Tool converting repository conventions into deterministic checks for AI agents to manage and enforce.

HN theaniketmaurya 7/1/2026

Show HN: Petabyte-scale storage for AI agent sandboxes

Celesto enables petabyte-scale persistent storage for AI agent sandboxes, useful for coding agents and large file handling.

HN aman-flyprox 7/1/2026

Siplinx AI Meeting Notetaker for Zoom, Google Meet

Siplinx AI runs local LLM and speech-to-text models on-device for meeting transcription and note-taking without cloud.

HN Timmyzzz 7/1/2026

Best AI Coding Token Plans in 2026: A Practical Comparison for Developers

Comparison guide of AI coding assistant subscription plans for 2026 with focus on developer workflows and value.

HN zubairov 7/1/2026

Show HN: Agentic Data Engineering

Agentic data engineering: using LLMs to generate SQL queries and automate analytics workflows. Practical workflow example.

HN Abhavk 7/1/2026

Show HN: Erlangchain – A tiny Erlang client for LLMs

Erlangchain: lightweight Erlang library for OpenAI/Anthropic LLM calls with tool-use and multimodal support, zero third-party dependencies.

HN saravanan2294 7/1/2026

Show HN: World Model MCP v0.10.0 – cross-runtime memory across 7 coding agents

MCP tool providing cross-runtime temporal knowledge graph memory for 7 AI coding agents, with constraint validation and context re-injection.

HN tommyjepsen 7/1/2026

Supervised vs. Unsupervised AI-generated code

Essay on supervised vs unsupervised AI-generated code: examining human review in loop versus shipped-without-review code.

HN joozio 7/1/2026

New attack provides one more reason why AI browsers are a bad idea

Security analysis of AI browser vulnerabilities: LLMs can be tricked via prompt injection to execute forbidden actions.

HN mpfect 7/1/2026

Liquid AI releases a 230M model optimized for phones, Raspberry Pi, and robots

Liquid AI releases LFM2.5-230M, 230M-parameter model optimized for edge devices with fast inference for agentic tool-use workflows.

HN atshu21 7/1/2026

Ragit – chat with any folder of documents using a local LLM

Ragit is a local RAG CLI tool for chatting with document folders using Ollama. Enables offline LLM document retrieval without API keys.

HN yogisotho 7/1/2026

Where do you answer"is the agent allowed to do this?"–one place,orevery adapter?

Rust-based unified agent substrate framework for governance and orchestration across multiple systems.

HN ggaswint 7/1/2026

Show HN: Aegize (trying to mitigate the risk of AI)

Open-source Aegize project implementing security layer for AI tools via identity, policy, and permissions controls.

HN backlit4034 7/1/2026

The Job archetypes of the future, according to Claude Code's creator

Claude Code creator Boris Cherny discusses five job archetypes emerging in AI era: builder, operator, explorer, etc.

Ax Brent A. Griffin, Jason J. Corso 7/1/2026

The Label Imitation Game: Turing Test Network for Zero-Shot Pseudo-Label Pruning

Label Imitation Game framework uses adversarial interrogation to prune hallucinations in foundation model pseudo-labels without standard thresholds.

Ax Mizanur Rahman, Abeer Badawi, Elahe Rahimi, Laleh Seyyed-Kalantari, Frank Rudzicz, Enamul Hoque, Elham Dolatabadi 7/1/2026

Training Therapeutic Judges and Multi-Agent Systems for Human-Aligned Mental Health Support

arXiv paper introducing TheraJudge framework using multi-agent systems and human-aligned evaluation for mental health support with LLMs.

Ax Arash Raftari, Mehrdad Mahdavi, Nathan Blackthorn, Andrew Arash Mahyari 7/1/2026

Curvature-Guided Module Localization for Low-Rank Detoxification of Backdoored Large Language Models

arXiv research on post-hoc detoxification of backdoored LLMs using curvature-guided module localization and low-rank repair.

Ax Soham De, Isaac Slaughter, Jiawei Guo, Qiao-Yun Cheng, Jiayuan Yan, Sruti Banerjee, Martin Saveski 7/1/2026

How Human Feedback Shapes AI-generated Community Notes

Analysis of how human feedback shapes AI-generated Community Notes in X's crowd-sourced fact-checking system extended with collaborative AI features.

Ax Wei Geng, Nitinder Mohan, J\"org Ott 7/1/2026

Budget-Adaptive Routing: Skipping the Weak When the Strong Answers Anyway

Budget-adaptive routing system for edge-cloud inference that skips weak model inference when offload budget allows, optimizing based on varying computational constraints.

Ax Ved Sriraman, Peihan Liu, Daniel Hsu, Adam Block 7/1/2026

Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback

Theoretical analysis explaining why on-policy distillation outperforms offline imitation learning with noisy expert feedback, with implications for language model training.

Ax Mohammad Nour Al Awad, Sergey Ivanov 7/1/2026

Loc2Repair: A Framework for Evaluating the Impact of File-Level Issue Localization in Repo-Level LLM Repair

Loc2Repair modular evaluation framework for repository-grounded LLM repair systems isolating file-level issue localization as a key failure mode.

Ax Naihao Deng, Yilun Zhu, Joan Nwatu, Clayton Scott, Rada Mihalcea 7/1/2026

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

Analysis of deductive stereotyping failure mode in LLMs where models apply population-level statistics to individuals, with Fair-GCG mitigation approach.

Ax Stefano Calzolari, Rubens Montanha, Gabriel Schneider, Gustavo Wide, Paulo Knob, Francesco Strada, Andrea Bottino, Soraia Raupp Musse 7/1/2026

LLM-Driven Personalities for Decision Making in Emergency Simulations

Framework for using LLMs to drive decision-making behavior in virtual agents for emergency simulations, enabling believable autonomous agent behavior in interactive environments.

Ax Gaurab Baral, Aaditya Khanal, Yangyang Tao, Junxiu Zhou 7/1/2026

Knowledge Distillation from Large Reasoning Models to Compact Student Models: A Case Study on the John O Bryan Mathematics Competition

Knowledge distillation study from DeepSeek-R1 to Qwen2.5-7B using Chain-of-Thought training corpus from math competition problems with LoRA fine-tuning on Apple Silicon.

Ax Zhiyuan Yao, Zheren Fu, Zhixiao Zheng, Jiajun Li, Yi Tu, Zhendong Mao 7/1/2026

ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs

ADAPT method mitigating hallucinations in multimodal LLMs by aligning attention dynamics and using preference tuning to improve text-to-image cross-attention during generation.

Ax Guangsheng Bao, Lihua Rong, Yanbin Zhao, Xiao Yu, Qiji Zhou, Yue Zhang 7/1/2026

Triospect: A Three-Dimensional Framework for Robust Statistical AI-Generated Text Detection Against Diverse Attacks

Triospect framework for detecting AI-generated text by analyzing content, expression, and stylistic elements, robust against 17 attack types across multiple domains.

Ax Orian Dabod, Amir Cohen, Gabriel Stanovsky 7/1/2026

When Reranking Hurts: Uncertainty-Based Gating for Few-Shot Reranking

Training-free gated reranking method that uses model uncertainty to decide whether reranking few-shot examples improves LLM performance across NLU and machine translation tasks.

Ax Anindya Jana, Snehasis Banerjee, Arup Sadhu, Ranjan Dasgupta 7/1/2026

A Modular Vision-Language-Action Robotics Framework for Indoor Environments

Modular vision-language-action robotics framework for autonomous agents performing complex indoor tasks from natural language instructions.

Ax Duc Cao Dinh, Khai Le-Duc, Florent Draye, Chris Ngo, Terry Jingchen Zhang, Bernhard Sch\"olkopf, Zhijing Jin 7/1/2026

PruneGround: Plug-and-play Spatial Pruning for 3D Visual Grounding

PruneGround method using spatial pruning to improve efficiency and accuracy in 3D visual grounding by focusing on relevant scene regions.

Ax Apurva Gandhi, Vishwas Suryanarayanan, Raja Hasnain Anwar, Firoz Shaik, Shubhang Desai, Thong Q. Nguyen, Muhammad Taqi Raza, Vishal Chowdhary, Graham Neubig 7/1/2026

PPT-Eval: A Benchmark for Computer-Use Agents on PowerPoint Tasks

PPT-Eval benchmark with 120 PowerPoint tasks to evaluate computer-use agents on content creation and presentation editing scenarios.

Ax Shivam Ratnakar, Yixuan Zhu, Cecilia Cheng, Chaya Vijayakumar 7/1/2026

One Retrieval to Cover Them All: Co-occurrence-Aware Knowledge Base Reorganization for Session-Level RAG

Session-level RAG system reorganizing knowledge bases with co-occurrence clustering to cover multi-question user sessions beyond single-query retrieval.

Ax Snehasis Banerjee, Ranjan Dasgupta 7/1/2026

LLM-Powered Interactive Robotic Action Synthesis from Multimodal Speech, Gestures, and Music

Framework using LLMs to synthesize robotic actions from multimodal inputs including speech, gestures, and music for human-robot interaction.

Ax Abhishek Dey 7/1/2026

ComplianceGate: Classifier-Gated Multi-Tier LLM Routing for Inference in Regulated Industries

ComplianceGate system routes LLM queries through multi-tier classifiers to enforce compliance and cost efficiency in regulated industries while protecting PII.

Ax Hao Sun, Yu Song, Shiyu Teng, Ziwei Niu, Yen-Wei Chen 7/1/2026

MIRTH: Mutual-Information Reasoning with Temporal Hubs for Vision-Language-Action Agents

MIRTH framework for vision-language-action robotic agents addressing temporal understanding, reasoning gaps, and inference efficiency in physical control tasks.

Ax Woosung Kim, Youngjun Suh, Jinho Lee, Jongmin Lee, Byung-Jun Lee 7/1/2026

AETDICE: Unified Framework and Offline Optimization for Nonlinear Multi-Objective RL

Unified offline reinforcement learning framework for nonlinear multi-objective optimization capturing complex trade-offs like risk and fairness.

Ax Jongchan Choi, Nari Yang, Sung Soo Park, Jaemin Cho, Han Seoyoung, Haerin Shin, Jun-Hyung Park 7/1/2026

Can LLMs Imagine Moral Alternatives Beyond Binary Dilemmas?

Dataset and evaluation of LLMs' ability to imagine moral alternatives beyond binary dilemmas in ethical reasoning tasks.

Ax Noah Scharrenberg, Chang Sun 7/1/2026

Probing Stylistic Appropriation using Large Language Models: An Evaluation Framework for Copyright Infringement under EU Law

Evaluation framework using LLMs to detect stylistic copyright infringement in generated text beyond verbatim memorization detection.

Ax Binh Mai, Tran Quoc Bao Le, Hung Dinh, Cong Tran 7/1/2026

SwiftAudio: Data-Efficient Caption-Only Distillation for One-Step Text-to-Audio Diffusion-based Generation

One-step text-to-audio distillation framework using caption-only training data without paired audio for efficient generation.

Ax Vasileios C. Pezoulas, Nikolaos S. Tachos, Eleni Georga, Kostas Marias, Manolis Tsiknakis, Dimitrios I. Fotiadis 7/1/2026

TDGT: A Tabular Data Generation Toolkit supporting adaptive GPU-accelerated Bayesian mixture models, diffusion-based models, and latent-space generative modeling

Web-based toolkit for synthetic tabular data generation supporting Bayesian mixture models, diffusion, and latent-space generative modeling.

Ax Xueqiao Sun, Xiaohan Wang, Ludwig Schmidt, Serena Yeung-Levy, Yuhui Zhang 7/1/2026