Show HN: A real-time strategy game that AI agents can play
Real-time strategy game benchmark where LLMs write code to play. Novel evaluation framework for frontier models.
Real-time strategy game benchmark where LLMs write code to play. Novel evaluation framework for frontier models.
Web app implementing scientific brainstorming workflow using LLM guidance. Developer-built practical tool.
Opaal: visual designer for multi-agent orchestration prompts. Drag-and-drop workflow builder generating production-ready prompts for coordinated agent tasks.
Commentary on AI code generation impact on writing practices. Claude Code implications.
Microsoft Office bug exposed customer emails to Copilot AI. Security incident with LLM integration.
Research showing LLM-generated passwords are cryptographically insecure despite appearing strong, due to token prediction design. Documents risk in AI agent code generation.
Comb: zero-dependency, hash-chained conversation memory system for AI agents. Persistent state management.
Experience report building a Node.js native module for EXIF metadata with LLM assistance, showing practical application of code generation for cross-language library porting.
AI video generation tool for non-coders. Text-to-video and image generation with creative templates, built in 1 month.
CSL MCP Server: formal verification engine for AI agent safety policies using Z3. Constitutional Specification Language with deterministic runtime enforcement outside model.
Analysis of LLM inference economics covering pricing trends from Anthropic and OpenAI partnerships, explaining cost structures for serving LLMs at scale.
CompareClaw: comparison tool for OpenClaw agent wrappers and hosting options. Managed services, desktop apps, no-code tools evaluated on deployment speed and pricing.
News: Google, Nvidia announce infrastructure deals at India AI summit. Subsea cables and compute partnerships.
Assay: tool verifying LLM outputs to catch hallucinations. Research finding: RLVF training degrades with more data (91.5% at 120 pairs → 77.4% at 2000 pairs).
PolyMCP: framework exposing Python functions as MCP tools for autonomous agents. Multi-step workflows with adaptive planning and orchestration across services.
Study: Claude, ChatGPT, Gemini generate passwords appearing strong but easily guessable. Testing reveals bias in LLM password generation.
Teapot: penetration testing methodology and harness for voice-based AI agents. Systematic test framework for security assessments of voice interfaces.
Production-ready AI agent templates using x402 micropayments for API access without signup/KYC overhead. 5 templates, 93 tests passing, HTTP-native payments on USDC/Base.
Research on prompt coupling: prompts optimized for one LLM fail on others. Format changes swing accuracy 78 percentage points; best practices overlap <20% between model families.
Research on DPO and GRPO training: group size 2 GRPO performs comparably to DPO under on-policy settings for model training.
Analysis: local AI coding agents will shift toward async remote agents as automation increases, changing developer workflows.
Browser Terminal Use is a Chrome extension and CLI tool for running commands in browser terminals from local shell, enabling agent loops in browser context.
Clawlet: single-binary AI agent with built-in hybrid semantic memory (vector + full-text search) using bundled SQLite, no external dependencies.
Explores AI agent skills as reverse engineering playbooks for repeatable analyst-in-the-loop tasks. Formalizes skills as reusable temporal policies for agent composition.
kkr-query2xlsx is a SQL runner with GUI/CLI that exports query results to XLSX/CSV. General developer tool, not AI/ML related.
TextAnimations.online generates MP4/GIF animations from text prompts using LLM-generated HTML/JS rendered client-side. Uses LLM for creative generation.
Seamless Auth is an open-source passwordless authentication system using WebAuthn and passkeys. Infrastructure tool, not AI/ML focused.
Personal anecdote about productivity with minimal technical content.
Python library implementing Multi-Head Latent Attention for KV cache compression in transformer models. Achieves 2-16x compression on LLaMA, Mistral, Qwen with Riemannian optimization.
Clawy is a $20 hardware companion device that visualizes Claude Code operations, showing tool execution, permissions, and interactions via physical interface.
Open source collection of 16 product skills for AI coding agents; encodes PM frameworks for Claude.
Product analytics platform for conversational AI agents; analyzes user interactions and pain points.
Open source security firewall for AI agents built in Go; prevents unauthorized access and actions.
Browser automation tool using LLMs for task execution; minimal detail provided.
Trivia game using chatbots with no technical substance.
satgate-proxy enforces hard budget caps on MCP tool calls for AI agents. Zero dependencies, runs locally via Node.js, works with Claude Desktop and Cursor.
Openfuse is a circuit breaker tool for microservices that addresses distributed system challenges. Not AI/ML related.
arXiv framework announcement with no actual content about AI agents or research.
Baseline Core: open-source MIT-licensed skill system for product teams. Structured framework integrating Claude Code with product workflow methodology.
Owoa: generative AI-based image watermarking resistant to camera capture. Solves analog hole in digital rights management.
vibe-infer documents learning GPU programming with Claude Code, showing the iterative back-and-forth process of AI-assisted learning rather than polished final results.
Prodlint: scans AI-generated JavaScript/TypeScript for production issues. Detects hallucinated imports, missing auth, exposed secrets, N+1 queries without requiring LLM.
Security layer preventing AI agents from accessing raw API credentials. Agents make calls through secure proxy replacing curl/native clients.
AI support agent system for small business customer support. Built by developer using agent frameworks for technical support automation.
Memory layer for local LLMs using reflection to improve context awareness. Enables persistent learning for offline models.
Analytics tool tracking AI agent traffic to documentation. Title only, minimal content.
Non-technical founder building production SaaS using AI assistance. Title only, insufficient detail.
Discusses security and safety risks LLM agents pose to open-source projects. Title only, minimal content.
P2P payment and discovery layer for autonomous AI agents using blockchain, enabling agent-to-agent settlement without intermediaries.
Koyeb joins Mistral AI to build compute infrastructure for frontier model training and AI software deployment.