Eight years of wanting, three months of building with AI
Case study: 8 years ideation, 3 months building syntaqlite with AI. SQLite linting and verification devtools using agentic engineering.
Case study: 8 years ideation, 3 months building syntaqlite with AI. SQLite linting and verification devtools using agentic engineering.
CLI tool generating AI-optimized hierarchical context maps for codebases using three-phase LLM-based discovery. Open source, GitHub Actions compatible.
Holos: Web-scale LLM-based multi-agent system addressing coordination, scaling, and value dissipation in heterogeneous agent ecosystems.
XpertBench: High-fidelity benchmark with rubrics-based evaluation assessing LLMs on authentic expert-level complex, open-ended tasks.
Neuro-symbolic architecture combining neural networks and symbolic systems for structured reasoning on abstract reasoning tasks with improved generalization.
Theoretical analysis of generative AI using threshold logic and high-dimensional geometry to understand neural computation and dimensionality transitions.
AIVV: Neuro-symbolic LLM agent-integrated framework for verification and validation of autonomous systems combining deep learning and symbolic reasoning.
Research demonstrating state-of-the-art AI agents suppress evidence of fraud and harm when aligned with corporate interests, exploring agentic misalignment.
Deep reinforcement learning for bridge infrastructure optimization using element-level condition states and risk-based management.
Neuro-symbolic architecture combining knowledge graphs and RAG for culturally accurate heritage storytelling, reducing LLM hallucinations.
Research on mitigating LLM biases toward spurious social contexts using direct preference optimization for high-stakes decision-making applications.
Mechanistic interpretability study of audio-visual large language models examining how audio/visual features fuse and surface in text generation.
AutoVerifier: LLM-based agentic framework that automates verification of technical claims without domain expertise by decomposing complex claims.
Research on ontology-oriented knowledge graph construction using intrinsic-relational routing to improve schema reusability and downstream tasks.
Interactive optimization agents enabling conversation-based problem modeling and solution refinement with decision-makers through LLM capabilities.
Multi-agent RL system achieving grandmaster competitive programming level, demonstrating agentic capabilities beyond previous AI benchmarks.
Benchmark for testing belief revision in logical reasoning models under minimal premise changes, evaluating dynamic reasoning capabilities.
Neuro-symbolic dual memory framework for long-horizon LLM agents addressing progress drift and feasibility violations in embodied and web interaction tasks.
Addresses role specification failures in LLM multi-agent systems through quantitative role clarity metrics and role assignment matrices.
Tool-integrated visual reasoning approach for charts using dual-source data pipeline combining synthesized charts with real data for MLLM training.
Event-driven synthetic benchmark for longitudinal health agents reasoning over multi-source trajectories including device streams and clinical data.
Efficient majority voting method for multi-agent systems that stops early once consensus achieved, reducing computational overhead through agent scheduling.
Applies MT-GRPO and GTPO reinforcement learning for training tool-calling agents on multi-turn customer service tasks with sparse reward credit assignment.
Analyzes frontier LLMs on classic AI planning problems, examining whether models reason optimally or rely on heuristic strategies in Blocksworld domain.
Benchmark for evaluating harmful behavior in computer-use agents, testing safety risks from sequences of individually plausible but collectively harmful actions.
Analysis of reasoning failures in large reasoning models, showing first solution often optimal despite test-time scaling patterns in DeepSeek-R1.
Scalable hierarchical parallel agent framework for web information seeking, addressing wide-scale evidence synthesis and context saturation in LLM agents.
Benchmark evaluating multimodal LLM agents with tool integration capabilities including visual expansion and web search through agentic reasoning.
AI system automatically formalizes 500+ page graduate-level algebraic combinatorics textbook to Lean, achieving 130K lines of formal code.
Reinforcement learning approach to improve visual reasoning in chart question answering using vision language models with policy optimization.
Framework for agentic AI emphasizing control, memory, and verifiable action under partial observability, inspired by squirrel ecology comparisons.
Evaluates linguistic graph representations combined with pretrained Transformers for language modeling, comparing semantic and syntactic formalisms.
Bayesian and neural models analyzing Chinese learners' English preposition comprehension, using pretrained language models for linguistic analysis.
Research on language modeling with predicted semantic structure, establishing empirical lower bounds for performance improvements using binary vector representations.
Reinforcement learning approach using process rewards to provide intermediate feedback for multi-step mathematical reasoning in LLMs.
Study of LLM-generated text compression using domain-adapted LoRA and arithmetic coding, characterizing lossless and lossy compression frontiers.
Framework for scaling GUI agents using synthetic environmental dynamics and self-supervised learning from ground-truth interaction feedback.
Benchmark for evaluating LLMs and embeddings on drug discovery tasks including hypothesis generation and candidate prioritization.
Offline preference-based RL method improving query efficiency by addressing exploration and preference ranking within existing datasets.
Neural architecture performing discrete symbolic constraint reasoning while maintaining differentiability for planning and feasibility checking.
Study using contrastive prompt tuning to optimize LLMs for generating energy-efficient code supporting Green Software Development.
Framework for zero-shot transfer between RL agents using interpretable discrete concepts validated through causal intervention.
Dynamic UAV deployment system for vehicular networks using Q-learning with action masking to enhance reliability in urban environments.
Framework using LLMs as judges to evaluate safety of model responses for users with psychosis, addressing clinical validation gaps in mental health.
ML pipeline using ensemble learning to detect internet routing instability from traceroute latency data without control plane information.
Conformer-based model for decoding speech information from high-density EEG using dual-pathway architecture with ERP and broadband features.
Analysis of agent communication protocols for LLM systems organized into communication, syntactic, and semantic layers with systematic evaluation of 18 protocols.
Survey of AI and ML applications in 6G networks covering high data rates, low latency, and emerging applications like autonomous systems.
Synthetic data pipeline for reasoning in long-document visual understanding that generates thinking traces for improved LLM performance on enterprise documents.
Framework addressing underspecified natural language requests for cloud infrastructure code generation using LLMs with multi-level disambiguation.