COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance
Tool for verifying and explaining RL policies for multi-bridge network maintenance with formal safety guarantees and interpretability.
Tool for verifying and explaining RL policies for multi-bridge network maintenance with formal safety guarantees and interpretability.
Generative-reconstructive-discriminative network with ROI attention for industrial surface defect detection and localization.
Re-evaluation of LiRA membership inference attacks under realistic assumptions, questioning prior effectiveness claims with realistic threat models.
Systematic comparison of four training objectives (cross-entropy, prototype, triplet, AP loss) for out-of-distribution detection in image classification.
Procedural dataset generation framework for engine sounds with embedded control annotations for automotive audio synthesis.
Security analysis of large vision-language models vulnerable to semantic slot filling attacks that elicit unsafe outputs.
Algorithm for neural spike waveform compression and classification using adaptive level crossing and latent feature representation.
Hierarchical multi-agent system for Kubernetes autoscaling addressing resource waste through coordinated pod and node scaling policies.
arXiv paper on Staged Multi-Agent Training (SMAT) for co-adaptive exoskeleton control, using curriculum learning to mirror human motor adaptation.
arXiv paper on silicon photonics acceleration for diffusion model inference, targeting energy efficiency of UNet and attention mechanisms.
arXiv paper on physics-based reinforcement learning for data-driven exoskeleton control using joint-moment prediction instead of lab-based inverse dynamics.
arXiv research evaluating synthetic data for baggage trolley detection in airport logistics systems.
arXiv research on federated learning with compression for non-convex optimization on heterogeneous distributed data.
arXiv research on ML-driven microarchitectural techniques addressing memory bottleneck in modern computing systems.
arXiv paper on scaling Mixture-of-Experts model training using Megatron Core, addressing systems challenges in sparse model architectures across memory, communication, and computation.
Differentiable equilibrium blocks for multi-agent incentive design in game theory and economics.
MPC framework for brand auction advertising in real-time bidding systems.
Report on Chinese AI companies distributing 8 billion yuan in coupons during Lunar New Year for agentic AI apps. Market analysis of agent deployment in China.
Postmortem of Tess.Design, AI image marketplace with artist royalties (50%). Launched May 2024, shut down January 2026 with learnings on ethical AI models.
Study evaluating 14 AI agents across 2 benchmarks on 12 metrics across 4 reliability dimensions. Finds recent capability gains yield only small improvements in actual reliability compared to accuracy scores.
2016 retrospective guide on navigating PhD programs. Academic career advice unrelated to AI/tech development.
Open source agent framework forking OpenAI's Symphony, using Claude Code for autonomous implementation of Linear board issues. AI agents with LLM integration.
Open source model-agnostic AI code review tool with full control over model choice and costs. Alternative to Claude Code Review.
Andrej Karpathy thought piece on autonomous AI agents conducting frontier research across compute clusters. Speculative/fictional framing of agentic research systems.
Security research on model artifact integrity during local LLM inference in llama.cpp. Creates llm-inference-tampering project targeting inference-layer attacks.
Framework for defining requirements and specifications for AI systems beyond testing/evals. Addresses gap between eval scores and actual user satisfaction in AI products.
Technical overview of Apache Iceberg table format write throughput limitations and transaction handling in data catalogs. Not AI-related.
Mobile app replicating military DAGR GPS navigator functionality for $3.99. Open source navigation tool for land navigation without network.
PUG: tool that converts messy API documentation into structured CLI tools and MCP servers using LLMs for AI agents.
Discussion about learning retention mechanics that avoid manipulative design patterns.
TLAi+ Benchmarks: dataset and benchmark suite for evaluating LLMs on TLA+ formal specification tasks with diverse problem types.
News article about Iranian drone attacks on AWS data centers in UAE and Bahrain causing service outages.
Autonoma: AI agents that automatically generate test suites and find bugs by navigating applications without manual test scripts.
Rainy Updates: deterministic dependency review and upgrade tool for Node monorepos with CI/CD integration and automated fix PRs.
TrueNAS build system repository moved to closed-source internal infrastructure for security and Secure Boot support.
User request for read-only LLM-powered email triage and knowledge extraction tool without write permissions.
Startup simulator game where players build a SaaS company to $100M ARR using React and Three.js.
Discussion asking about architecture and parameter size estimates for GPT-5.4, Gemini 3.1, and open models.
Guide on getting started with Common Lisp programming language, covering setup and IDE configuration.
Author's experience with WebKit issues while building Hopp, discusses problems with Tauri and reasons for switching to native Rust.
Essay on limitations of open weights models without open training data, discussing post-training challenges for trillion parameter models.
AI-powered technical interview prep tool simulating realistic interviewer interactions with WebRTC and Socket.io.
Nvidia planning to launch NemoClaw, an open-source AI agent platform for enterprise software companies to dispatch agents.
Plannotator: open source tool for manual code review and feedback loops for autonomous agents. OSS framework for agent improvement via human feedback.
CLI tool using Claude to analyze project codebases and generate customized Claude Code configurations. Integrates with Claude CLI for code-specific setup.
Analysis of SRAM-centric AI accelerators (Cerebras, Groq, d-Matrix) vs GPUs for inference, focusing on near-compute vs far-compute memory tradeoffs.
Agentis: AI-native programming language with LLM as standard library, using binary hashed DAG for version control instead of text files.
Jobbi.app: AI tool that automatically tailors resumes to job descriptions by extracting relevant content from master resume.
Part 2 of observability-driven harnesses for autonomous optimization of systems built with AI agents, focusing on verification loops.
Analysis of Claude Code's /loop feature enabling autonomous agent operation with multiple roles and cadences for AI programming workflows.