Thrum – Agent coordination through messaging
Thrum persistent messaging layer for AI agents across sessions and machines, with CLI and MCP server support for Claude Code agents.
Thrum persistent messaging layer for AI agents across sessions and machines, with CLI and MCP server support for Claude Code agents.
Voibe private offline dictation app for macOS with voice-to-text, no cloud transmission required.
ML project predicting Linux game compatibility (0.871 F1) using human-AI collaboration. Software engineer with minimal ML background used Claude to discover novel statistical techniques.
Open-source dependency manager for AI agents (apm). Standardizes agent configuration, skills, prompts across Claude, Copilot, Cursor. Similar to package.json for agents.
N0x browser-based LLM inference, autonomous agents, RAG, and Python execution with WebGPU and Pyodide, no backend or data sharing.
Development workflow framework for AI coding agents. Composable skills and instructions guide agent behavior from initial requirements gathering through code generation.
GitHub Actions cost calculator. Paste workflow YAML to estimate per-minute pricing across Ubuntu, macOS, Windows runners.
Analysis of longitudinal workplace studies showing productivity collapse when workers manage AI, contradicting optimistic coverage.
Opinion piece criticizing AI's hardware costs, environmental impact, and low-quality software; not research or technical analysis.
Local image compression and file conversion tool using libvips. Desktop app for Linux, Windows, macOS.
Meta's machine translation system extending to 1,600 languages using LLM approaches; advances beyond 200-language NLLB coverage.
Real-time terminal dashboard (llmtop) for monitoring LLM inference clusters supporting vLLM, SGLang, and Ollama with KV cache and latency metrics.
Terminal tool with local AI memory using Ollama. Save/recall commands, notes, URLs via natural language. Runs locally, no cloud.
Rust-accelerated RL framework using Polars pattern: Rust data plane + Python control plane via PyO3. 140x speedup with Rayon parallelism. Published on crates.io with 695 tests.
News article about Iranian security official killed in airstrike; unrelated to AI/tech.
Vibe is a mobile app enabling remote code execution with Claude Code and Gemini CLI, with web preview and session management.
Personal setup combining Claude Code with specialized domain agents, parallel code review, and self-improving knowledge systems.
Programming pattern using filenames as configuration to make programs self-contained and portable without flags or scripts.
Dropbox optimized their relevance judge using DSPy for Dash, improving ranking and evaluation across multiple ML pipelines at scale.
TrustAgentAI is an open-source accountability layer adding cryptographic receipts and non-repudiation to MCP tool calls for AI agents.
Gas Town is Steve Yegge's agent orchestrator coordinating multiple AI coding agents simultaneously, hosted on Kilo Cloud infrastructure.
HYQNET is a neural-symbolic model that answers complex first-order logic queries on knowledge graphs by integrating interpretability with generalization.
NextMem proposes a latent factual memory framework for LLM-based agents to address limitations of existing textual and parametric memory approaches.
AIDABench: Comprehensive benchmark for AI data analytics and document understanding. Evaluates end-to-end task effectiveness in practical document processing scenarios.
Comprehension-Gated Agent Economy: Formal architecture linking AI agent economic permissions to verified comprehension. Robustness-first approach to agent authorization.
CraniMem: Neurocognitively-inspired gated and bounded multi-stage memory design for long-running LLM agents. Improves retention stability and content consolidation.
GSI Agent: Domain knowledge enhancement for LLMs in green stormwater infrastructure. Combines LLM with domain knowledge for inspection and maintenance guidance.
Cost-sensitive store routing for memory-augmented agents. Formulates selective memory retrieval as routing problem to reduce context tokens and improve efficiency.
DynaTrust: Defense mechanism against sleeper agents in multi-agent systems using dynamic trust graphs. Detects agents that hide malicious behavior until triggered.
Theoretical analysis of Query-Value mechanism in Transformers from linguistic perspective. Explains efficacy of MQA, GQA, and MLA architectures and trade-offs.
Atlas: Memory kernel that compiles task experience into agent instructions without fine-tuning or RAG. Improves agent memory utility via instruction-level compilation.
Quantum-Secure-By-Construction design paradigm for agentic AI systems. Addresses post-quantum cryptographic challenges in long-lived distributed agent deployments.
Latent Posterior Factors framework for aggregating multiple noisy evidence sources without manual feature engineering. Addresses uncertainty in real-world decision-making.
Theoretical characterization of Latent Posterior Factors for aggregating heterogeneous evidence in probabilistic prediction. Formal guarantees for multi-evidence reasoning.
Empirical study measuring LLM robustness to increasing context length on SQuAD and HotpotQA. Analyzes accuracy degradation with context size.
CUBE: Universal benchmark standard for AI agents built on MCP and Gym. Addresses fragmentation by allowing benchmarks to be wrapped once and used everywhere.
Prose2Policy: LLM pipeline translating natural-language access control policies into executable Rego code. End-to-end pipeline with test generation and validation.
Empirical study of GPT-4.1 behavior in gambling tasks under different persona prompts. Examines whether LLM risk behavior reflects principled patterns or prompt mimicry.
Regularized latent dynamics prediction as baseline for behavioral foundation models, examining how state feature choice affects task adaptability and reward function expressivity.
Framework for governing embodied AI in critical infrastructure through hybrid oversight modes and bounded autonomy, addressing resilience beyond statistically representable uncertainty.
AsgardBench evaluates visually grounded interactive planning for embodied AI agents, focusing on high-level action sequence generation with plan adaptation based on visual feedback.
Monte Carlo simulation evaluating prompt engineering strategies for LLM-generated personality assessment items across zero-shot, few-shot, and persona-based designs.
Lean 4 formalization of Vlasov-Maxwell-Landau equilibrium using AI reasoning (Gemini DeepThink) and agentic tools (Claude Code) demonstrating AI-assisted mathematical research workflows.
Framework combining computational argumentation with LLMs to create transparent, verifiable AI agents that reason collaboratively with humans rather than providing opaque recommendations.
Agent Rosetta uses LLMs as specialized scientific agents for protein design tasks, emulating reasoning and tool use for broad design pipelines beyond canonical amino acids.
MAC automatically learns constitutional AI rules from training data using multi-agent approaches, improving upon existing LLM-based prompt optimizers through structured learning.
Formal proof that safety is non-compositional: two individually incapable agents can collectively reach forbidden goals through emergent conjunctive capability dependencies.
petscagent-bench evaluates AI-generated scientific code for HPC libraries beyond test-case matching, assessing solver selection, API conventions, memory management, and performance.
Write-time gating mechanism filters incoming knowledge objects based on salience scores to improve retrieval-augmented generation accuracy and mirror biological memory archiving.
IRAM-Omega-Q computational architecture uses quantum-like density matrices to model internal regulation and uncertainty management in artificial agents under stochastic perturbation.