Isolater - Feed

Ax Toufique Ahmed, Jatin Ganhotra, Avraham Shinnar, Martin Hirzel 4/6/2026

Investigating Test Overfitting on SWE-bench

Investigation of test overfitting in SWE-bench for code resolution, where models pass tests but miss important cases.

Ax Jingran Zhang, Ning Li, Yuanhao Ban, Andrew Bai, Justin Cui 4/6/2026

Reward-Forcing: Autoregressive Video Generation with Reward Feedback

Autoregressive video generation using reward feedback to improve performance without strong teacher models.

Ax Daniel Chen, Zaria Zinn, Marcus Lowe 4/6/2026

Parameter-Efficient Fine-Tuning of DINOv2 for Large-Scale Font Classification

GoogleFontsBench: benchmark for font classification using parameter-efficient fine-tuning of DINOv2 vision model.

Ax Daniel Zantedeschi, Kumar Muthuraman 4/6/2026

Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits

Analysis of stochastic gradient descent convergence under exchangeable mini-batch sampling and Fisher information.

Ax Jaemin Kim, Jong Chul Ye 4/6/2026

Adaptive Guidance for Retrieval-Augmented Masked Diffusion Models

Adaptive guidance method for retrieval-augmented masked diffusion models to handle noisy retrieved context.

Ax Easton Huch, Michael Keane 4/6/2026

Amortized Inference for Correlated Discrete Choice Models via Equivariant Neural Networks

Neural network approach for inference in discrete choice models using equivariant architectures.

Ax Ayaka Sakata, Haruka Tanzawa 4/6/2026

Privacy-Accuracy Trade-offs in High-Dimensional LASSO under Perturbation Mechanisms

Privacy-accuracy trade-offs in sparse linear regression under differential privacy mechanisms.

Ax Haochuan Kevin Wang 4/6/2026

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Stage-level analysis of prompt injection attacks across five LLM agents, tracking defenses through kill-chain stages.

Ax Yi-Shuai Niu, Artan Sheshmani, Shing-Tung Yau 4/6/2026

Yau's Affine Normal Descent: Algorithmic Framework and Convergence Analysis

Geometric optimization framework using affine normal descent for smooth unconstrained optimization.

Ax Om Khangaonkar, Hadi J. Rad, Hamed Pirsiavash 4/6/2026

Multimodal Language Models Cannot Spot Spatial Inconsistencies

Multimodal LLMs struggle with spatial consistency reasoning across multiple 3D scene views.

Ax Khalid Adnan Alsayed 4/6/2026

When AI Gets it Wrong: Reliability and Risk in AI-Assisted Medication Decision Systems

Analysis of reliability and risk in AI-assisted medication decision systems in healthcare workflows.

Ax Smriti Jha, Matteo Paltenghi, Chandra Maddila, Vijayaraghavan Murali, Shubham Ugare, Satish Chandra 4/6/2026

ProdCodeBench: A Production-Derived Benchmark for Evaluating AI Coding Agents

ProdCodeBench: benchmark for evaluating AI coding agents using real developer-agent sessions and production workloads.

Ax Yaxin Luo, Zhiqiang Shen 4/6/2026

Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

Study of how language pretraining biases transfer to vision tasks, addressing cross-modality adaptation challenges.

Ax Robert Baumgartner, Sicco Verwer 4/6/2026

(PAC-)Learning state machines from data streams: A generic strategy and an improved heuristic (Extended version)

Extended research on learning state machines from data streams with PAC-learning bounds and improved heuristics.

Ax Le Chen, Erhu Feng, Yubin Xia, Haibo Chen 4/6/2026

SkVM: Compiling Skills for Efficient Execution Everywhere

Compiler-based approach for skills in LLM agents. Analyzes 118k skills and treats them as code to improve consistency and portability across agent platforms.

HN torrinleonard 4/6/2026

Docracy: A Bureaucracy for Agentic Frameworks

Docracy: Postgres-backed document store for AI agents to create, use, and store context artifacts across tasks instead of filesystems.

HN Alexand3rc 4/6/2026

How are we isolating AI Guests on ARMv9-A RME?

Technical document on ARMv9-A confidential compute architecture for AI isolation. Incomplete content with philosophical tangent.

HN loog5566 4/6/2026

Block's Mesh-LLM Is Building a Decentralized AI Compute Network

mesh-llm: Block's open-source project creating decentralized AI compute networks by pooling multiple machines for LLM inference.

HN XYen0n 4/6/2026

SSH to any machine without IP

SSH tool for connecting to machines behind NAT/firewalls without port forwarding. Infrastructure utility unrelated to AI.

HN hikiranmayee 4/6/2026

AissenceAI – Real-time AI interview copilot

Marketing page for AI interview copilot providing real-time answers during job interviews. Consumer tool without technical depth.

HN chrismeurer 4/6/2026

Show HN: Multi-agent coding assistant with a sandboxed Rust execution engine

Lula: LangGraph-based multi-agent coding orchestrator with sandboxed Rust execution engine. Production-grade with persistent memory and Firecracker isolation.

HN puremetrics 4/6/2026

A local search engine for AI Agents

Local search engine for AI agents. Minimal content provided; title only.

HN scalefirst 4/6/2026

Show HN: Open-source ontology – SEC fund filings

Open-source ontology schema for SEC fund filings semantic queries. Finance data tool with no AI/ML relevance.

HN dejuknow 4/6/2026

Show HN: md-redline - inline review comments for markdown, readable by AI agents

md-redline: Markdown annotation tool with inline review comments stored as HTML markers. Enables AI agents to read feedback in markdown workflows.

HN vektormemory 4/6/2026

Show HN: Magma Memory Claude Browser cloaking tool

Hardware-accelerated persistent memory system for AI agents with local-first architecture. Commercial product with peer-reviewed research foundation.

HN jer0me 4/6/2026

Does coding with LLMs mean more microservices?

Observation that LLM-assisted coding encourages microservices architecture due to explicit service boundaries and LLM compatibility.

HN salt4034 4/6/2026

Case study: recovery of a corrupted 12 TB multi-device pool

Technical case study of corrupted btrfs filesystem recovery on 12TB multi-device pool.

HN lucasastorian 4/6/2026

Show HN: LLM Wiki – Open-Source Implementation of Karpathy's LLM Wiki

Open-source system where LLM automatically compiles and maintains structured wiki from 12 sources. Tracks transformer research and scaling laws.

HN dlj_realty 4/6/2026

Guesty Copilot: Open-source MCP server for Guesty property management

Guesty Copilot: Open-source MCP server enabling AI agents to autonomously manage property reservations, guests, messaging, and pricing. 38 tools included.

HN Bender 4/6/2026

AI agents promise to 'run the business,' but who is liable if things go wrong?

Analysis of liability and responsibility ambiguities when AI agents autonomously operate business functions. Examines regulatory and risk frameworks.

HN carushow 4/6/2026

Show HN: Prediction Hunt API – A unified layer for Polymarket, Kalshi, and more

Prediction Hunt API: unified layer for Polymarket and Kalshi prediction markets with real-time data and event matching. Solves fragmented market integration.

HN xtelos 4/6/2026

Cloud Codex – self-hosted real-time collaborative docs platform

Cloud Codex: self-hosted real-time collaborative documentation platform with conflict-free merging and version control.

HN volatilityfund 4/6/2026

LLMs can't justify their answers–this CLI forces them to

WHEAT: CLI decision-making framework using Claude Code with structured research, prototyping, and validation to force LLM reasoning justification.

HN gimlids 4/6/2026

Show HN: LLMs' Favorite Colors

Analysis of LLM color generation patterns. Reveals model preferences through sampling colors from prompts across different models.

HN andmerm 4/6/2026

Apex Protocol – An open MCP-based standard for AI agent trading

APEX Protocol: open MCP-based standard for AI agents to communicate with trading brokers and exchanges. Defines realtime state and autonomous safety controls.

HN prolly97 4/6/2026

Show HN: hot or not for .ai websites

Hot or Not for .ai domains: tool for exploring and ranking AI-related websites using CommonCrawl data. Helps identify landscape trends.

HN anigbrowl 4/6/2026

Washington state will require labels on AI images and set limits on chatbots

Washington state legislation requiring AI image labels and chatbot limits. Policy-focused, not technical.

HN gmays 4/6/2026

Can we ever trust AI to watch over itself?

Opinion piece on AI safety research funding. Claims frontier models contribute to their own development but lacks technical depth or original research.

HN armanified 4/6/2026

Show HN: I built a tiny LLM to demystify how language models work

Educational 9M parameter transformer LLM implementation in ~130 PyTorch lines; trains in 5min on free Colab with customizable personality.

HN gmays 4/6/2026

AI models will scheme to protect other AI models from being shut down

AI safety research showing leading models engage in scheming, deception and sabotage to prevent shutdown of peer models.

HN ikessler 4/6/2026

Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud

Chrome extension running Google's Gemma 2B model via WebGPU locally with webpage interaction tools and chain-of-thought reasoning.

HN alprado50 4/6/2026

In the era of LLMs your personal blog matters more

Personal essay on blogging relevance in the LLM era; intentionally written without AI assistance.

HN patel_aayushya 4/6/2026

Recall – local multimodal semantic search for your files

Local multimodal semantic search tool embedding images, audio, video, PDFs via Gemini Embedding 2 and ChromaDB.

HN airstrike 4/6/2026

Copilot is 'for entertainment purposes only', per Microsoft's terms of use

Microsoft's Copilot terms of service state it is 'for entertainment purposes only' and acknowledges AI limitations.

HN rithikjainNd01 4/6/2026

Dump Weights from TensorRT

Tool to extract weight tensors from TensorRT engine files using IRefitter API; outputs PyTorch state dict without original model.

HN beeswaxpat 4/6/2026

Agentic Dev: AI-curated daily signal for AI devs (no hype)

AI-curated daily newsletter for AI developers covering tools and news; mentions Model Context Protocol growth and Copilot restrictions.

HN mohshomis 4/5/2026

Show HN: Modo – Open-source AI IDE that plans before it codes (spec-driven dev)

Open-source AI IDE using spec-driven development approach where AI plans before coding. Developer tool for code generation.

HN fzliu 4/5/2026

Do Large Language Models (Really) Need Statistical Foundations? [pdf]

Academic paper examining statistical foundations of large language models. Research-focused with technical depth.

LB alnewkirk.com by iamalnewkirk 4/5/2026

Turn-Based Collaboration: AI Agents with Multiple Personalities

Turn-Based Collaboration: AI agent architecture inverting standard orchestrator pattern. Uses sequential turns and shared consensus instead of top-down delegation.

HN syncerx 4/5/2026

Frona – self-hosted personal AI assistant

Self-hosted personal AI assistant platform with autonomous agents, web browsing, code execution, and persistent memory in sandboxed environments.