Show HN: S0 Tuning – +23.6pp on HumanEval by tuning state, not weights
S₀ Tuning: parameter-efficient fine-tuning method for hybrid recurrent-attention models achieving +23.6pp on HumanEval with zero inference overhead.
S₀ Tuning: parameter-efficient fine-tuning method for hybrid recurrent-attention models achieving +23.6pp on HumanEval with zero inference overhead.
Tool for factual grounding and verification in LLM outputs. Minimal description available.
Essay on when developers should and shouldn't use LLMs in development. Balanced perspective on LLM integration in workflows.
Discussion about reducing corporate tone in LLM outputs. Community question without technical solution or research.
Mobile AI agent (Sova) controlling installed Android apps through natural language. Banned by Google for app automation capabilities.
Multi-agent system using 28 OpenClaw instances coordinating ops, marketing, and releases with self-correction and goal coordination.
Meta research on adaptive ranking model scaling LLM-complexity ad recommendation systems to balance inference speed, accuracy, and cost.
Dochia: API testing tool with agent skill generation for agentic build-test-fix workflows, produces OpenAPI specs and test reports.
Local-first desktop AI agent supporting autonomous coding, multi-agent teams, computer use, and 15+ model providers; open source BSL-1.1 license.
OmniVoice: multilingual text-to-speech model supporting 600+ languages using discrete non-autoregressive diffusion architecture.
Tool for aggregating scattered notes, metrics, and transcripts into shareable interactive reports.
Opinion piece on consumer frustration with customer service chatbots and AI refund processes.
Security scanner specifically designed to detect vulnerabilities in code generated by AI models and LLMs.
Open source database navigation tool using static analysis instead of AI, enabling SQL-free database queries with local execution.
SparrowDB: embedded Cypher-compatible graph database written in Rust, optimized for point lookups and aggregations rather than traversals.
SmallestAI integration with n8n workflow automation platform for building AI agents and RAG applications.
Minimal information provided; appears to be mythology-related content.
Open source MCP server exposing 225 Zabbix API tools for AI assistants to manage monitoring infrastructure via Model Context Protocol.
Cloudflare Workers AI adds support for large language models including Kimi K2.5, with infrastructure for building and deploying AI agents at scale.
Minimal information provided about Playcanvas and Gracia 4dgs project.
AI data quality and model collapse prevention through smart data pruning pipeline instead of post-training hallucination detection.
Free tool for running standardized UX questionnaires (SUS, UEQ, UMUX-Lite, NASA-TLX) with visualization.
AMD Lemonade: open source local LLM server for GPU/NPU with 2MB footprint, OpenAI API compatible, prioritizing privacy and offline execution.
Developer discusses building alternative to Granola.ai meeting notes tool with broader accessibility for non-corporate users.
Discussion claiming most advice from Claude Code AI assistant is measurably inaccurate.
OpenAI acquires TBPN media company to amplify AI conversation among builders and influencers.
Desktop pet companion application built using Claude Code's system prompts.
Appears to be spam/corrupted content about Airtable and EV site selection with mixed advertising elements.
WebGPU benchmarking tool with minimal description.
Emscripten system library enabling WebGPU access from C/C++ code compiled to WebAssembly.
Agentmatic is an AI agent platform that generates full marketing campaigns from prompts with persistent brand memory across sessions.
Memsearch provides persistent cross-session semantic memory for AI coding agents with zero-configuration plugin installation.
Ask HN: Users share experiences selecting LLM models for agentic software development lifecycle with specific use cases.
SideX: Tauri-based VS Code port replacing Electron with native backend, 96% smaller with early-stage open source development.
Analysis of 138 practitioner conference talks examining how companies adopt AI agent architectures, architectural patterns, and LLM-driven agentic system implementation.
arXiv paper: Diversity-aware RKL divergence improves LLM distillation by focusing on dominant modes in teacher-student training.
MAC-Attention: acceleration technique for long-context LLM decoding by reusing prior attention computations for semantically similar tokens without compression.
REM-CTX: RL-based peer review system using 8B LLM with Group Relative Policy Optimization, incorporating visual figures and scholarly context.
Apprenticeship learning approach for inducing pedagogical policies from imperfect, evolving student demonstrations in e-learning environments.
Systematic evaluation of LLMs for educational essay scoring across holistic and analytic rubrics, analyzing human alignment and bias.
QAsk-Nav benchmark for evaluating embodied agents combining navigation and dialogue-based question-asking for collaborative object finding tasks.
Energy-based models framework for physical system identification with formal stability guarantees, applying to Port-Hamiltonian dynamics.
Research on reducing modality gap in Vision-Language Models like CLIP through geometric analysis to improve cross-modal tasks like captioning and clustering.
VeriAct: agentic system for synthesizing correct and complete formal specifications using LLMs beyond just verifier-passing output.
SANA-I2I: text-free flow matching framework for paired image-to-image translation with application to fetal MRI artifact reduction.
Asymmetric Actor-Critic method for improving reliability of multi-turn LLM agents in one-shot settings without requiring model retraining.
CASA: conditional decoding strategy for robust multimodal safety in MLLMs against cross-modal attacks.
Prompt-guided image compression for Vision-Language Models optimized for downstream VLM tasks rather than human perception.
Research on vulnerabilities in aligned AI agents with filesystem and email access; introduces ACDC for automated circuit discovery in transformers.
RAGShield: five-layer defense against knowledge base poisoning attacks in RAG systems deployed across federal agencies.