Your terminal, finally has memory!
Terminal tool with local AI memory using Ollama. Save/recall commands, notes, URLs via natural language. Runs locally, no cloud.
Terminal tool with local AI memory using Ollama. Save/recall commands, notes, URLs via natural language. Runs locally, no cloud.
Rust-accelerated RL framework using Polars pattern: Rust data plane + Python control plane via PyO3. 140x speedup with Rayon parallelism. Published on crates.io with 695 tests.
News article about Iranian security official killed in airstrike; unrelated to AI/tech.
Vibe is a mobile app enabling remote code execution with Claude Code and Gemini CLI, with web preview and session management.
Personal setup combining Claude Code with specialized domain agents, parallel code review, and self-improving knowledge systems.
Programming pattern using filenames as configuration to make programs self-contained and portable without flags or scripts.
Dropbox optimized their relevance judge using DSPy for Dash, improving ranking and evaluation across multiple ML pipelines at scale.
TrustAgentAI is an open-source accountability layer adding cryptographic receipts and non-repudiation to MCP tool calls for AI agents.
Gas Town is Steve Yegge's agent orchestrator coordinating multiple AI coding agents simultaneously, hosted on Kilo Cloud infrastructure.
HYQNET is a neural-symbolic model that answers complex first-order logic queries on knowledge graphs by integrating interpretability with generalization.
NextMem proposes a latent factual memory framework for LLM-based agents to address limitations of existing textual and parametric memory approaches.
AIDABench: Comprehensive benchmark for AI data analytics and document understanding. Evaluates end-to-end task effectiveness in practical document processing scenarios.
Comprehension-Gated Agent Economy: Formal architecture linking AI agent economic permissions to verified comprehension. Robustness-first approach to agent authorization.
CraniMem: Neurocognitively-inspired gated and bounded multi-stage memory design for long-running LLM agents. Improves retention stability and content consolidation.
GSI Agent: Domain knowledge enhancement for LLMs in green stormwater infrastructure. Combines LLM with domain knowledge for inspection and maintenance guidance.
Cost-sensitive store routing for memory-augmented agents. Formulates selective memory retrieval as routing problem to reduce context tokens and improve efficiency.
DynaTrust: Defense mechanism against sleeper agents in multi-agent systems using dynamic trust graphs. Detects agents that hide malicious behavior until triggered.
Theoretical analysis of Query-Value mechanism in Transformers from linguistic perspective. Explains efficacy of MQA, GQA, and MLA architectures and trade-offs.
Atlas: Memory kernel that compiles task experience into agent instructions without fine-tuning or RAG. Improves agent memory utility via instruction-level compilation.
Quantum-Secure-By-Construction design paradigm for agentic AI systems. Addresses post-quantum cryptographic challenges in long-lived distributed agent deployments.
Latent Posterior Factors framework for aggregating multiple noisy evidence sources without manual feature engineering. Addresses uncertainty in real-world decision-making.
Theoretical characterization of Latent Posterior Factors for aggregating heterogeneous evidence in probabilistic prediction. Formal guarantees for multi-evidence reasoning.
Empirical study measuring LLM robustness to increasing context length on SQuAD and HotpotQA. Analyzes accuracy degradation with context size.
CUBE: Universal benchmark standard for AI agents built on MCP and Gym. Addresses fragmentation by allowing benchmarks to be wrapped once and used everywhere.
Prose2Policy: LLM pipeline translating natural-language access control policies into executable Rego code. End-to-end pipeline with test generation and validation.
Empirical study of GPT-4.1 behavior in gambling tasks under different persona prompts. Examines whether LLM risk behavior reflects principled patterns or prompt mimicry.
Regularized latent dynamics prediction as baseline for behavioral foundation models, examining how state feature choice affects task adaptability and reward function expressivity.
Framework for governing embodied AI in critical infrastructure through hybrid oversight modes and bounded autonomy, addressing resilience beyond statistically representable uncertainty.
AsgardBench evaluates visually grounded interactive planning for embodied AI agents, focusing on high-level action sequence generation with plan adaptation based on visual feedback.
Monte Carlo simulation evaluating prompt engineering strategies for LLM-generated personality assessment items across zero-shot, few-shot, and persona-based designs.
Lean 4 formalization of Vlasov-Maxwell-Landau equilibrium using AI reasoning (Gemini DeepThink) and agentic tools (Claude Code) demonstrating AI-assisted mathematical research workflows.
Framework combining computational argumentation with LLMs to create transparent, verifiable AI agents that reason collaboratively with humans rather than providing opaque recommendations.
Agent Rosetta uses LLMs as specialized scientific agents for protein design tasks, emulating reasoning and tool use for broad design pipelines beyond canonical amino acids.
MAC automatically learns constitutional AI rules from training data using multi-agent approaches, improving upon existing LLM-based prompt optimizers through structured learning.
Formal proof that safety is non-compositional: two individually incapable agents can collectively reach forbidden goals through emergent conjunctive capability dependencies.
petscagent-bench evaluates AI-generated scientific code for HPC libraries beyond test-case matching, assessing solver selection, API conventions, memory management, and performance.
Write-time gating mechanism filters incoming knowledge objects based on salience scores to improve retrieval-augmented generation accuracy and mirror biological memory archiving.
IRAM-Omega-Q computational architecture uses quantum-like density matrices to model internal regulation and uncertainty management in artificial agents under stochastic perturbation.
Model Workspace Protocol (MWP) simplifies agentic AI orchestration using folder structures for sequential workflows, reducing engineering overhead compared to multi-agent frameworks.
Enhances OpenVLA vision-language-action models with synthetic instruction augmentation to improve zero-shot performance in new environments for embodied AI tasks.
POaaS optimizes prompts for on-device small language models through minimal edits, reducing hallucinations and improving accuracy without requiring lengthy structured instructions.
Context alignment pre-processor enhances LLM dialogue coherence by resolving contextual misalignment when users omit premises, simplify references, or shift context during interactions.
ARISE uses hierarchical reinforcement learning to improve mathematical reasoning in LLMs by developing reusable strategies that accumulate during training rather than treating problems in isolation.
VIGIL deploys edge-resident AI agents for enterprise IT support, performing diagnosis, knowledge retrieval, and policy-governed remediation on user devices with consent and observability.
NeuronSpark: 0.9B-parameter spiking neural network language model using state-space dynamics and surrogate gradients without Transformer distillation.
SQL-ASTRA: agentic reinforcement learning framework for text-to-SQL using column-set matching and trajectory aggregation for credit assignment.
Data contamination audit reveals public LLM benchmarks may be leaked in training data; questions claims of superhuman performance.
Framework for safe LLM-based IoT agents using dual-stage intent analysis to prevent hallucination and reduce interaction overhead.
MOSAIC: modular control token approach for context-dependent safety alignment in LLMs across applications and regions.
Adaptive theory of mind framework for LLM-based multi-agent coordination, aligning agents' reasoning depth about others' mental states.