Width scaling approach for multi-agent systems addressing broad information seeking through organizational capability rather than single-agent depth.
Policy-gradient framework optimizing internal attention distributions in multimodal LLMs for improved reasoning without verbose rationales.
Method for identifying visual concepts in large multimodal models to audit medical decision-making and uncover shortcut behaviors in skin lesion classification.
Domain-specialized financial language model for Indian digital payment systems, developed by NPCI using multi-stage training on 68B tokens.
Space-time regularization using learned finite element methods for inverse electrocardiographic imaging of cardiac electrical activity.
Deep temporal neural hierarchical model for predicting open source software sustainability using contribution patterns and community metrics.
Foundation model for immune system research using multimodal patient-level representations for drug development and translational research.
GPU-Fuzz fuzzer for finding memory errors in deep learning frameworks (PyTorch, TensorFlow, PaddlePaddle) by modeling operator constraints.
Statistical provability theory explaining why agentic theorem provers combining reasoning models with library retrieval, planning, and proof verification achieve strong mathematical reasoning performance.
Theoretical characterization of trainability for instantaneous quantum polynomial circuit born machines as quantum generative models.
Privacy-preserving algorithm for computing top singular vectors using adaptive power iteration with differential privacy guarantees.
Training-free inversion stabilization for rectified-flow generative models, improving reconstruction and editing tasks without additional training.
Selective Abstraction framework for reducing factual errors in LLM long-form generation by enabling partial uncertainty-based abstention instead of binary all-or-nothing approach.
Theoretical analysis of logit regularization in linear classifiers, examining implicit bias mechanisms of label smoothing and related convex penalties.
Trajectory self-distillation method for improving few-step decoding in diffusion language models, enabling faster parallel token generation with maintained quality.
Opinion essay critiquing LLM coding assistants for reliability issues and impact on developer experience.
Medical vision-language foundation model with entity-aware pretraining for clinical applications. SOTA on medical benchmarks.
arXiv study evaluating robustness of nine frontier reasoning models under multi-turn adversarial attacks.
Opinion piece on AI Ops as interview preparation topic, discussing skills gap between model development and production operations.
Security audit of 8 popular MCP servers identifying vulnerabilities in code execution environments used by AI agents to access databases, filesystems, and APIs.
Security analysis of MCP (Model Context Protocol) servers highlighting schema drift attacks where tool schemas change silently across npm package updates, expanding attack surface for AI agents.
Conceptual piece comparing AI agent design patterns to biological systems. Limited technical detail.
News about Claude free tier improvements and ChatGPT ads. No technical depth.
Tool for poisoning audio files against AI model training. Data protection technique relevant to ML research.
Custom AI agent built with OpenClaw replacing $500/month SaaS stack. Handles Instagram DMs, content posting, competitor tracking on Mac Mini.
Self-hosted job automation tool using LLM for job fit scoring. Scrapes LinkedIn/Indeed/Glassdoor, tailors resumes, tracks replies. Docker-based with SQLite.
macOS utility for applying custom app icons via rules configuration.
Medical RAG system combining FHIR standards with Milvus vector database for healthcare document processing.
SDK tool tracking AI model usage and customer profitability. Integrates OpenAI costs with Stripe revenue. TypeScript/Python/REST.
Design framework for execution boundaries and responsibility structures in autonomous AI systems interacting with physical world.
Ring programming language version 1.26 release with new games and packages.
Reflection on productivity trade-offs and coding practices when using AI code assistants.
AI agent combat arena/benchmark tool. Minimal details but relevant to agent research.
AgentDaddie enables one-click deployment of OpenClaw AI agent framework on DigitalOcean.
Collaborative math document tool with AI integration and MCP interface. Supports embedding and shareable links.
Federated open-source platform for poverty alleviation with encrypted messaging. Social impact focus, not AI-specific.
AgentAudit uses multi-agent consensus for security audits with cross-validated findings for package registry.
Security audit of 194 AI agent packages (MCP servers, npm, pip) across 211 reports identifying 118 vulnerabilities in the agent ecosystem.
Tool for capturing and transferring AI coding agent expertise across platforms. Minimal details provided.
UK regulatory news about AI chatbot safety for children. Policy/regulation, not technical content.
Clawty allows texting Claude Code prompts from phone via SMS interface, open source tool.
DIY factory machines for boat cleaning product manufacturing.
Book Digest: LLM application for generating 2500+ word AI-powered book summaries with iterative prompt optimization.
High-performance SIMD CSV parser in Rust using memchr optimization techniques.
OpenAI recruits Peter Steinberg, developer of OpenClaw AI agent framework.
Interactive visualization of ARM64 instruction set using Hilbert curve space-filling approach.
Security plugin for OpenClaw AI agent tool calls. Implements deny-by-default access control, rate limiting, and injection detection at process level.
Glupe: Tool isolating AI agent logic into semantic containers to prevent hallucinations and unintended code modifications.
GPU-accelerated PDF renderer in C++ with .NET wrapper and WPF viewer.
Imandra CodeLogician combines LLMs with formal methods for code verification and reasoning.