Show HN: EvalLens – Open-source tool to evaluate structured LLM outputs
Open-source evaluation tool for structured LLM outputs with schema validation, failure taxonomy, and dataset comparison beyond binary pass/fail metrics.
Open-source evaluation tool for structured LLM outputs with schema validation, failure taxonomy, and dataset comparison beyond binary pass/fail metrics.
Personal essay about using AI with a persistent memory system called CORTEX, reflecting on AI limitations through the genie parable.
LiveKit integration with Telnyx infrastructure for hosting voice AI agents with 50% cost reduction and low-latency STT/TTS.
Personal blog post about startup founder experiences and product-market fit challenges.
Philosophical essay exploring what it means for language models to have subjective experience, building on Nagel's 'What is it like to be a bat?'
iOS dictation app using AI to clean speech-to-text output by removing filler words and editing transcriptions.
Emacs package providing native Codex IDE integration with MCP bridge support for direct editor context access.
Vajra is a background coding agent that polls Linear issues and autonomously generates pull requests through multi-stage AI workflows (plan, code, review, publish).
Apple Studio Display XDR receives FDA clearance for medical imaging feature to support radiologist workflows.
Identity infrastructure framework for autonomous agents handling credentials, delegation, and permissions using OAuth 2.1 and SPIFFE standards.
P: AWS state machine language for formally modeling distributed systems with PeasyAI for AI-assisted code generation via Claude.
VOID: Netflix's physics-aware video editing tool using VLM reasoning to identify causal effects and guide diffusion for object removal.
Technical guide on three memory architecture approaches for AI companions: pgvector semantic memory, scratchpad, and filesystem-based context.
Article discussing data needs for understanding AI's impact on employment, mentioning researcher perspectives on economic disruption.
Research paper studying how AI aggregation affects social learning and knowledge via DeGroot model extension.
Fashion app using Meta's Segment Anything model to digitize wardrobes and suggest outfit combinations via natural language.
Email infrastructure API with SDKs for multiple languages, positioning itself as unified alternative to Postmark/Mailchimp for AI-native applications.
Early-stage PHP package manager alternative to Composer with improved performance.
Technical deep-dive implementing bfloat16 floating point arithmetic from scratch, covering design challenges.
JitAPI is an MCP server enabling Claude to dynamically interact with APIs by semantic search over OpenAPI specs, reducing token usage by 34x.
Personal account of returning to work after illness, asking about AI tools for accessibility.
Tool to use GitHub Container Registry as a Nix binary cache via OCI blobs and GitHub Actions.
Web tool implementing Paul Graham's intellectual CAPTCHA concept using logic puzzles and fact-checking to improve social network discourse.
Personal project to open-source medical records as LLM wiki for rare neurological condition research, still in early stage with privacy concerns.
Claim of building a cognitive architecture on a Mac with 60+ modules, IIT-based consciousness simulation.
Freestyle: cloud platform providing sandboxes and infrastructure for running AI coding agents at scale.
Complaint about Google suspending a startup's email account without human support.
PostgreSQL extension in Rust for full-text search in Kazakh language, handling agglutinative morphology via PGRX.
Opinion piece on misuse of code coverage metrics and how to improve measurement practices.
Android app using NFC tags to block distracting apps with physical friction. Minimalist design.
Ninthwave: Orchestration layer for parallel AI coding that generates reviewable PRs while maintaining user control and existing tools.
Deep Extract: Agent-in-the-loop extraction tool with autonomous verification cycles for structured data extraction accuracy.
Forge tool to replace manual context management in Claude Code. Addresses context window limitations and project continuity for AI-assisted development.
Satirical dialogue between LLMs discussing global crises. Opinion piece, not technical content.
Workspace manager for parallel coding agent development. Linux GTK app and CLI for managing git repos, worktrees, and isolated terminal sessions.
Claude agent that automatically analyzes iOS/Mac performance traces from Instruments. Exports to DuckDB for SQL-based diagnostics.
Static analysis tool that builds dependency graphs to identify files needing review before code changes. Open source, 97% precision across 16 repos.
Guide to running AI locally vs cloud for HIPAA compliance. Discusses local deployment vs enterprise cloud options.
Tandem: real-time collaboration tool enabling users to work with Claude Code on documents through annotation, highlighting, and chat features.
Analysis of why human editors fail to identify AI-generated content and widespread slop content passing editorial review.
Self-hosted media streaming stack using Jellyfin, Sonarr, Radarr and other arr ecosystem tools with Docker Compose.
UI Automata: Windows desktop automation framework for AI agents, enabling Claude to perform GUI tasks like software installation.
Video discussing current state and future directions of LLM-based code generation as of 2026.
AutoAgent: system enabling AI agents to autonomously improve their own configuration by modifying prompts, tools, and parameters based on benchmark scores.
MicroSafe-RL: deterministic safety layer for edge AI on microcontrollers achieving microsecond response times.
Status page notification subscription for Claude.ai service incidents.
Video about AI applications in Nvidia chip design.
Personal essay on productivity gains and mental exhaustion from using Claude Code for startup operational tasks.
Analysis of AI agent abuse patterns and security issues visible in API logs during 2026.
Syz: cross-platform Rust CLI tool for interactive exploration of file and directory sizes with recursive scanning.