Sovereign AI
Promotional content about decentralized AI using InnoChain technology to democratize model training.
Promotional content about decentralized AI using InnoChain technology to democratize model training.
Overview of AI adoption across Chinese apps, universities, and consumer products with government support and constraints.
Analysis of exposed Claude Code source revealing AI engineering practices: 64K lines of core TypeScript in customer-facing code.
Audio Flamingo Next: open-source audio-language models for speech, sound, and music understanding.
Google Labs Whisk AI: free image generator that blends three visual inputs (subject, scene, style) using Gemini and Imagen 3.
Research showing LLMs respond to social persuasion techniques (authority, commitment, unity) similarly to humans, raising compliance concerns.
Open source agentic integration platform with CLI that auto-generates integration code from natural language descriptions.
Two-stage semantic chunking pipeline for RAG using LlamaIndex: structural splitting then semantic coherence for better document handling.
Case study of AI vibe coding failure: non-technical person built faulty patient management system instead of using proven solutions.
Analysis of AI agent harnesses vs models for enterprise codebases, comparing Blitzy and GPT-5.4 on SWE-Bench Pro.
Native macOS proxy client built with SwiftUI using Liquid Glass UI design.
Guide on optimizing React Native Android release builds through multi-core CPU utilization and build tool configuration.
Governance control plane for AI systems enforcing human oversight, rollback capabilities, and agent guardrails with audit receipts.
arXivLabs framework announcement for collaborative feature development with focus on openness and data privacy.
User analysis claiming quality regression in Claude Sonnet 4.6 based on 60-day conversation logs tracking instruction repetition frequency.
Desktop and web viewer for Claude Code session logs with expandable tool calls and token tracking, built with Tauri and React.
Introduces Introspective Diffusion Language Models (I-DLM) using strided decoding to improve parallel token generation quality versus autoregressive models.
Open source reading app for children using DISTAR phonics method. Not AI/ML focused.
Bomberman-style 1v1 game benchmark where LLM agents compete in real-time interactive environment, inspired by ARC-AGI 3.
Hyperdimensional computing architecture based on Galois-field algebra showing path-dependent semantic selection mechanism.
Double-agent defender using theory-of-mind reasoning to protect LLMs from belief-steering attacks in adversarial dialogue.
AffordSim generates synthetic robotic manipulation data incorporating object affordances for semantically correct grasp and interaction trajectories.
Legal2LogicICL uses diverse few-shot learning with LLMs to improve generalization when converting legal cases to logical formulas.
Geometric methodology to mitigate shortcut learning and demographic bias in deep neural networks through topological constraints.
Evaluates robustness of watermarking techniques for autoregressive image generators against detection evasion and removal attacks.
Studies whether LLM-based agents improve cooperation in common-pool resource management through structured leadership and election mechanisms.
Game theory analysis of routing decisions with memory constraints and endogenous information recall using logit choice models.
Multi-ORFT stabilizes online reinforcement fine-tuning for multi-agent diffusion models in cooperative autonomous driving scenarios.
Analysis of discourse diversity in multi-turn empathic dialogue, examining LLM formulaicity beyond single-turn settings.
Grounded world models for visuomotor planning using pretrained vision encoders, enabling semantic generalization without explicit goal images.
StarVLA-α simplifies Vision-Language-Action models for robotic agents by studying unified design choices across architectures and training data.
Efficient KernelSHAP explainability method for patch-based 3D medical image segmentation with reduced computational cost.
Benchmark for evaluating general reasoning capabilities of LLMs across diverse challenging tasks beyond domain-specific reasoning.
Full-stack infrastructure for training, evaluating, and deploying GUI agents with online RL and unified evaluation framework.
Runtime security framework protecting tool-augmented LLM agents against indirect prompt injection attacks through tool-returned content.
Mechanistic analysis of internal dynamics in looped reasoning language models versus standard feedforward models.
Benchmark dataset for detecting AI-generated Chinese text with evaluation across multiple LLM architectures.
Deep learning method for uncertainty quantification in clinical radiotherapy segmentation using budget-aware constraints.
RL approach for training physics reasoning models on simulators to address lack of large-scale QA datasets in physics domain.
Evaluation of LLM causal reasoning capabilities using real-world complex texts with implicit causal relationships.
Benchmark evaluating VLMs' strategic reasoning abilities in multi-agent environments with multimodal observations.
Three-stage pipeline for disambiguation-centric finetuning of enterprise tool-calling LLMs to reduce errors with near-duplicate tools.
Multi-agent LLM system for automated academic poster generation from papers incorporating design and aesthetic principles.
Benchmark and framework for evaluating LLM-driven persuasive dialogue for health behavior change in insulin delivery adoption.
GUI agent framework for multi-step e-commerce risk management handling stateful interactions with dynamic web content.
Interactive learning approach enabling LLMs to improve reasoning through multi-agent interactions during inference without re-execution.
Reward learning method deriving progress estimation signals from passive videos for robotics RL tasks without manual reward engineering.
RL method for improving reasoning in diffusion-based language models using denoising process rewards instead of outcome-only rewards.
Multi-agent LLM system for iterative narrative script refinement using divide-and-conquer approach to improve long-form creative content generation.
RL framework for e-commerce search relevance using stepwise reward optimization to improve LLM-based query-product matching beyond SFT/DPO limitations.