Multi-agentic Software Development is a Distributed Systems Problem (AGI can't save you)
Research on using choreographic languages as a formalism for describing multi-agent LLM workflow coordination, framing it as a distributed systems problem.
Research on using choreographic languages as a formalism for describing multi-agent LLM workflow coordination, framing it as a distributed systems problem.
QitOS is a research-first framework for building reproducible LLM agents with clean module design, benchmarks, and built-in observability.
Research on choreographic languages for managing multi-agent LLM coordination as a distributed systems problem with new programming language design.
Business impact of AI search engines. HubSpot experienced 140M lost visits as search behavior shifts toward AI-powered tools.
Production-grade skills framework for AI coding agents. Encodes workflows, quality gates, and engineering best practices as reusable skills activated via slash commands.
Title only with minimal metadata. No substantive content provided.
AEGIS: Scaling homomorphic encrypted transformer inference via hybrid parallelism on multi-GPU. Privacy-preserving ML optimization, niche application.
Circuit duplication technique for frozen vision transformer inference on marine species classification. ML optimization off-topic domain.
MetaSAEs: Introduces decomposability penalty for training sparse autoencoders with atomic latents. Improves alignment and safety-relevant applications.
Compares RAG vs standard approaches for Agile story point estimation in sprint planning. arXiv study on LLM application.
TRACE: Study on how LLMs allocate trust between conflicting code, documentation, and tests. Evaluates trustworthiness in AI-assisted software engineering.
ExpressEdit: Photoshop plugin using diffusion models for facial expression editing. Computer vision application, off-topic for interests.
RDFace benchmark dataset for rare disease facial phenotype analysis in children with synthetic data generation. ML research but off-topic domain.
Introduces vocabulary dropout technique to solve diversity collapse in co-evolutionary LLM self-play curriculum learning. arXiv paper with novel method.
LLM-powered evolutionary search automatically discovers unsupervised uncertainty quantification methods as Python programs for claim verification.
Fine-tuning approach adapting DeepSeek-OCR-2 for optical chemical structure recognition by formulating task as image-to-text.
Study of brain-LLM alignment during creative divergent thinking tasks, measuring correlation between model performance and human neural activity.
VisionClaw wearable AI agent on Meta Ray-Ban glasses combining egocentric perception with speech-driven task execution via OpenClaw agents.
Sim2Real-AD framework for zero-shot sim-to-real transfer of VLM-guided RL policies from CARLA simulation to physical autonomous vehicles.
Dynamic model analyzing productivity-skill tradeoffs when workers use AI tools, decomposing productivity effects into expertise-dependent and independent channels.
Taxonomy of LLM-based coding agent architectures analyzing scaffolding code patterns including control loops, tool definitions, and context strategies.
Novel salient object detection method based on user needs rather than visual stimuli alone.
LangFIR uses sparse autoencoders on monolingual data to discover language-specific features for steering LLM output language without parallel corpora.
AgenticFlict dataset of merge conflicts from AI coding agent pull requests on GitHub, studying integration challenges in collaborative AI-assisted development.
Video diffusion framework (CRAFT) for generating synthetic bimanual robot manipulation demonstrations with temporal coherence.
Phase-aware suppression method to reduce hallucinations in Vision-Language Models without iterative optimization overhead.
SecPI framework for secure code generation using reasoning LLMs through security reasoning internalization, addressing inference-time vulnerability mitigation.
Actor-critic reinforcement learning approach for multi-robot task allocation with asymmetric arrivals and switching delays.
Neural method for black-box global optimization using iterative refinement from noisy samples, addressing multi-modal function optimization.
LLM-based approach for multi-file repository code generation with executable validation, addressing dependency resolution and integration challenges.
LiveCoder framework for repository-level code generation preserving and reusing task-specific state across multiple LLM attempts.
Generative foundation model for multimodal histopathology that imputes missing modalities from incomplete medical data.
Reinforcement learning approach for environments with delayed feedback using homomorphic state representation.
Method for stable unsupervised self-evolution of multimodal LLMs using continuous softened retracing resampling for feedback quality.
Adaptive Relational Transformer for pedestrian trajectory prediction using temporal-aware relations in robotics.
Microservice system using NLP and deep learning to automate classification of citizen appeals in government services.
Unlocks prompt infilling in masked diffusion language models by applying full-sequence masking during supervised finetuning.
LightThinker++ enables LLMs to dynamically compress intermediate reasoning thoughts into compact representations for efficiency.
Uses LLMs to capture semantic relationships for tail-item sequential recommendation, addressing sparse interaction problem.
RDEx-CMOP is a differential evolution algorithm variant for constrained multiobjective optimization under budget constraints.
Graph learning approach for melanoma detection in dermoscopic images using graph signal processing.
Scientometric analysis of 15 years of augmented human research, examining conference evolution and core themes.
CREBench evaluates LLMs on cryptographic binary reverse engineering, assessing capabilities for vulnerability discovery and malware analysis.
Research identifying limitations in universality of linear truth directions in LLM activation spaces across different settings.
Study measuring human ability to distinguish LLM-generated news from human-written content across six LLM models.
AutoReSpec uses LLMs to generate formal specifications for programs, addressing syntax and logic errors through techniques for complex control flow.
Neuro-symbolic framework for robot manipulation using vision-language models and autonomous domain construction.
Method for discovering repeated attention patterns in large language models at scale for mechanistic interpretability.
Compares vision-language models and CNNs for spectrum management in satellite-terrestrial networks.
CountsDiff extends diffusion models to discrete ordinal data on natural numbers for generation and imputation tasks.