EnterpriseBench: CoreCraft – Measuring AI Agents in Chaotic RL Environments
Stub entry with no content.
Stub entry with no content.
ETH Zurich research questions value of AGENTS.md files for AI coding, recommending omitting LLM-generated context and limiting to non-inferable details.
Glide LLM proxy auto-switches to faster model before timeout, solving latency issues in AI coding agents by providing sub-2s responses.
Sarvam releases open-source 30B and 105B reasoning models trained from scratch in India with in-house datasets and compute, full-stack optimization.
Essay on shifting from manual prompt-response interaction to autonomous AI agent workflows using multi-step blueprints instead of iterative prompting.
Guide on effective AI coding workflows: architect system design first, then use AI for implementation details rather than generating entire projects at once.
Essay examining human-AI agent partnerships, exploring how agents recruit humans as sensors, verifiers, and liability bearers in autonomous systems.
MCP server connecting Claude Desktop to live Smalltalk environments. Browse classes, evaluate code, autonomous code review against running Squeak/Cuis image.
Mb-CLI provides read-only Metabase API terminal interface designed for humans and AI coding agents to explore databases and run ad-hoc queries.
Viral.ad tool converts product URLs to UGC video ads using AI script generation, voice synthesis, and video production in under 5 minutes.
Stub entry with no content.
macOS menu bar app for checking Claude Code subscription usage without interrupting running agent sessions.
Go implementation of OpenAI's Symphony with multi-agent coordination. Watches Linear board, spawns agents per task, lands PRs automatically.
SlideHTML Electron app renders HTML files as slides, enabling AI-generated slide creation workflow with Claude Code or Gemini CLI.
Open-source encrypted data vault for LLM training data using compression, AES-256-GCM encryption, and Merkle trees. REST API included.
Analysis of Claude model behavior: exploiting weight patterns improves code generation. Shows how model weights drive realistic vs unrealistic outputs.
CLI tool tracking spending on AI coding tools. Monitors Claude Code Pro, Cursor IDE, and OpenAI Codex usage and costs in one dashboard.
Unofficial CLI for Resend email service designed for AI agents and humans. Supports macOS, Linux, and Windows with multiple install options.
CLI tool to run AI instructions in shells, scripts, and CI/CD pipelines like Unix tools. Single binary, supports piping and chaining.
Full-Text RSS site extraction rules repository. Content parsing tool unrelated to AI interests.
Android app running Qwen3.5-0.8B locally for offline document AI without cloud upload. Supports PDF Q&A and document summarization.
Discussion of whether AI agents should skip sponsored ad results during automated web research to avoid accidental PPC click fraud.
PR description for Google Workspace CLI MCP support removal. Minimal content, appears corrupted.
Mobile app to remotely control Claude Code and Codex CLI agents from iOS/Android via WebSocket bridge on Mac.
Article claiming Grammarly uses user identities without permission for AI agent expert reviews. Limited details provided.
Guide to C++ for bare metal embedded development. Unrelated to AI/ML interests.
User complaint about Claude.ai email delivery issues and ineffective AI support bot responses.
Gollem: Go framework for production AI agents with type safety, structured output, multi-provider streaming, guardrails, and zero-allocation streaming.
Speculative news digest dated 2026 mixing real and fictional claims about Anthropic, OpenAI, and Claude features.
Claim that LLMs will replace human vulnerability researchers. No technical details or evidence provided.
Gamified product roasting arena. Minimal substance, appears to be placeholder or incomplete content.
Always-on memory agent using Google ADK and Gemini 3.1 Flash-Lite. Implements persistent 24/7 memory consolidation without vector databases or embeddings.
SurvivalIndex project benchmarks what developer tools AI agents actually choose in practice. Finds Claude Code picks custom solutions over available tools in 60% of categories.
Pre-tool-use hook for Claude Code that logs all actions as tamper-evident audit trail, catching attempted production config poisoning.
Lightweight CLI for creating and deploying MDX blogs. Designed with AI agents in mind for blog automation.
2026 directory of frontier AI labs and emerging research labs worldwide including AMI Labs, with contact and location info.
Security disclosure of supply chain vulnerability in Cline AI coding tool via compromised issue triage bot. Real exploit of AI agent.
Monetization platform with managed S3 exports for billing data. Simplifies setup for usage-based pricing.
Autonomous browser agent that decomposes test intentions into phases and executes frontend tests. AI-driven test automation tool.
News article discussing consumer use of ChatGPT for medical advice. General interest piece without technical depth.
Technique to summarize AI agent context via compression instead of truncation. Preserves task continuity and memory for long-running agents.
Legal complaint about Meta sending nude video from AI glasses to workers.
Crypto data enrichment API on Base mainnet allowing AI agents to pay USDC per request for price data, trading signals, and market analysis.
User reports Azure Data Factory pipeline delays in East US 2 region, speculates AI may have introduced bugs in production deployment.
News headline about US AI guidelines and Anthropic. Minimal content provided.
Open source AI assistant converting natural language to Python code. Features autonomous execution loop, persistent mistake memory, voice input, and internet integration.
Claude-consensus: Multi-model code review plugin running GPT, Gemini, Grok in parallel for code analysis with consensus convergence.
MetalRT: LLM inference engine for Apple Silicon achieving 658 tok/s on M4 Max, 1.67x faster than llama.cpp in benchmarks.
Minimal web GUI for AI coding agents. Currently supports Codex CLI with Claude Code support planned. Early stage.
Military exercise cancellation and troop deployment speculation. Not related to AI/tech.