HN sorenbs 25d ago

GLM 5.1: Pelican Test

GLM-5.1 is a 754B parameter open-source LLM that demonstrates improved reasoning and multi-modal capabilities like unprompted SVG+CSS generation.

HN saikatsg 25d ago

Your parallel Agent limit

Analysis of cognitive load and limitations when managing multiple parallel AI agents, focusing on human-in-the-loop costs beyond throughput metrics.

HN motakuk 25d ago

Enterprise-Managed Authorization for MCP

Enterprise authorization system for Model Context Protocol (MCP) servers using centralized identity providers. Addresses deployment challenges in large organizations.

HN hunglee2 25d ago

China's AI Ethics Governance

Newsletter promotion about AI ethics governance in China. Mostly self-promotional content with no technical depth or original research.

HN gpi 25d ago

Two Years of Valkey

Retrospective on Valkey, the open-source Redis fork created two years ago after Redis license change to source-available model.

HN MrBuddyCasino 25d ago

"I started to lose my ability to code"

Incomplete article about losing coding ability. Truncated content without substantive information. Likely newsletter signup page.

HN perch56 25d ago

EU's Exposed AI Infrastructure

Security analysis: 25,000+ publicly exposed Ollama instances found in April 2026, 22x increase from September 2025, raising infrastructure security concerns.

BL 25d ago

Introducing the Child Safety Blueprint

OpenAI announces Child Safety Blueprint framework for combating AI-enabled child sexual exploitation, developed with NCMEC and law enforcement partners.

HN LexSiga 25d ago

Ducklake Demo

Open-source lakehouse demo using DuckDB, dlt, and dbt. Complete runnable example of ELT pipeline with parquet files and analytics transformation.

Ax Min Sun (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development), Federica Storti (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development), Valentina Martino (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development), Miguel Gonzalez-Andrades (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development), Tony Kam-Thong (F. Hoffmann-La Roche AG, Roche Pharma Research and Early Development) 25d ago

Algebraic Structure Discovery for Real World Combinatorial Optimisation Problems: A General Framework from Abstract Algebra to Quotient Space Learning

Framework identifies algebraic structures in combinatorial optimization problems, constructs quotient spaces to reduce search space and improve solution quality.

Ax Andrew Sellergren, Chufan Gao, Fereshteh Mahvar, Timo Kohlberger, Fayaz Jamil, Madeleine Traverse, Alberto Tono, Bashir Sadjad, Lin Yang, Charles Lau, Liron Yatziv, Tiffany Chen, Bram Sterling, Kenneth Philbrick, Richa Tiwari, Yun Liu, Madhuram Jajoo, Chandrashekar Sankarapu, Swapnil Vispute, Harshad Purandare, Abhishek Bijay Mishra, Sam Schmidgall, Tao Tu, Anil Palepu, Chunjong Park, Tim Strother, Rahul Thapa, Yong Cheng, Preeti Singh, Kat Black, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Joelle Barral, Tris Warkentin, Shravya Shetty, Dale Webster, Sunny Virmani, David F. Steiner, Can Kirmizibayrak, Daniel Golden 25d ago

MedGemma 1.5 Technical Report

MedGemma 1.5 4B model expands medical capabilities with high-dimensional imaging (CT/MRI/histopathology), anatomical localization, and improved document understanding.

Ax Xiangyi Li, Kyoung Whan Choe, Yimin Liu, Xiaokun Chen, Chujun Tao, Bingran You, Wenbo Chen, Zonglin Di, Jiankai Sun, Shenghan Zheng, Jiajun Bao, Yuanli Wang, Weixiang Yan, Yiyuan Li, Han-chung Lee 25d ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

ClawsBench benchmark evaluates LLM agents on realistic productivity tasks (email, scheduling, documents) in simulated multi-service environments with stateful workflows.

Ax Eliza Berman, Bella Chang, Daniel B. Neill, Emily Black 25d ago

Attribution Bias in Large Language Models

AttriBench: Demographically-balanced benchmark for measuring attribution bias in LLMs when attributing quotes to original authors.

Ax Hangoo Kang, Tarun Suresh, Jon Saad-Falcon, Azalia Mirhoseini 25d ago

TRACE: Capability-Targeted Agentic Training

TRACE: Framework for targeted training of LLM agents on capability gaps identified in specific environments and task distributions.

Ax Md Atik Ahamed, Mihir Parmar, Palash Goyal, Yiwen Song, Long T. Le, Qiang Cheng, Chun-Liang Li, Hamid Palangi, Jinsung Yoon, Tomas Pfister 25d ago

TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

TFRBench: Benchmark for evaluating reasoning capabilities of time-series forecasting systems beyond numerical accuracy metrics.