ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation
ABot-N0 unified Vision-Language-Action foundation model for embodied robot navigation across five core tasks using hierarchical brain-action architecture.
ABot-N0 unified Vision-Language-Action foundation model for embodied robot navigation across five core tasks using hierarchical brain-action architecture.
SPES enables memory-efficient decentralized LLM pretraining using mixture-of-experts and distributed GPUs without full model replication on each node.
INTENT framework for budget-constrained LLM agents solving multi-step tasks under monetary constraints via intention-based planning.
MuRGAt benchmark evaluates multimodal LLM attribution and factual grounding across complex reasoning tasks involving multiple modalities and information sources.
MolmoSpaces open ecosystem for large-scale benchmarking of robot navigation and manipulation policies with diverse scenarios.
Voxtral Realtime streaming speech recognition model achieving sub-second latency with end-to-end training for audio-text alignment.
GameDevBench evaluation framework tests multimodal AI agent capabilities on game development tasks combining code, shaders, sprites, and animations.
ABot-M0: VLA framework with action manifold learning for robotic manipulation, includes data curation pipeline for heterogeneous embodiment data.
MOSS-Audio-Tokenizer proposes end-to-end discrete audio tokenization using homogeneous architectures for improved reconstruction and scaling in audio foundation models.
A2A framework enables connecting AI agents across different frameworks and teams for interoperability.
Research study on attention masking strategies in decoder-only LLMs for user representation learning using contrastive learning on large-scale behavioral data.
Neural Additive Experts framework balancing interpretability and accuracy in generalized additive models through feature interaction gating.
MetaphorStar uses end-to-end visual RL to improve MLLM understanding of metaphorical content in images, enabling multi-hop reasoning and cultural context awareness.
Feature Activation Coverage (FAC): measures post-training data diversity in LLM feature space for more effective downstream task performance.
Contamination-free medical benchmark for evaluating LLMs with automated rubric evaluation, addressing data leakage and temporal misalignment in clinical settings.
Method for reasoning in continuous latent space rather than discrete tokens, addressing feature collapse in latent reasoning paradigms for LLMs.
Quote reference with no content provided.
Proposes method for multi-stem music generation with flexible instrument control. Relevant to ML research but outside core AI agent/LLM focus.
Theoretical and empirical analysis of safety alignment challenges in self-evolving multi-agent LLM systems; identifies self-evolution trilemma.
χ0 identifies distributional shift across human demonstrations, policy inductive bias, and test-time execution as robustness bottleneck in robotic manipulation, proposing alignment approach.
StealthRL reinforcement learning framework stress-tests AI text detector robustness via adversarial paraphrasing using GRPO and LoRA adapters.
OneVision-Encoder: codec-aligned sparsity principle for multimodal architectures that process sparse discriminative information efficiently.
Opinion piece questioning whether certain AI approaches represent viable path to artificial general intelligence.
Video generation framework for precise instance insertion into existing footage with sparse control, moving beyond prompt-engineering toward fine-grained controllable generation.
Curriculum learning approach using code generation for agents to progressively learn in open-ended environments with foundation models.
Applies deep reinforcement learning with graph neural networks to optimize parallel machine scheduling. Relevant to ML research but not directly related to LLMs or AI agents.
MemFly: memory optimization framework using information bottleneck principles for LLM agents to balance compression and retrieval precision in long-term memory.
Research on explainability methods for agentic AI systems that operate over multi-step trajectories, extending beyond single-prediction interpretability.
Clickbait title with no content provided.
Opinion on common pitfalls encountered by beginners starting AI learning journey.
Sparse video generation framework enables vision-language navigation agents to navigate unknown environments with minimal high-level instructions via beyond-the-view reasoning.
Analysis of GRPO reinforcement learning limitations in LLM reasoning due to implicit advantage symmetry; proposes improvements for exploration and difficulty adaptation.
GeneralVLA: vision-language-action model with knowledge-guided trajectory planning to improve zero-shot generalization in robotic control.
BPDQ: bit-plane decomposition quantization with variable grid for efficient 2-3 bit LLM inference under memory constraints.
Reasoning Cache (RC) algorithm enables LLMs to improve over long horizons via test-time adaptation and RL, improving extrapolation beyond training distribution.
Investigates computational efficiency advantages of diffusion-based language models versus standard approaches.
Novel method for fine-tuning quantized LLMs using evolution strategies instead of backpropagation, enabling high-precision adaptation on discrete, non-differentiable parameter spaces.
TIC-VLA framework for robot navigation that models delayed semantic reasoning in vision-language-action models for real-time control in dynamic environments.
ECHO-2 distributed RL framework for LLM post-training with remote inference workers, addressing cost efficiency and policy coordination challenges.
Analysis of immigration policy intersection with AI talent acquisition and workforce development.
Discusses trust and governance frameworks in AI deployment and policy.
Opinion article debating whether prompt-driven development qualifies as legitimate programming practice.
Technical guide on implementing agentic workflows for automated document processing and data extraction.
Technical overview of diffusion transformers including vision transformers and diffusion transformer variants.
Explores text diffusion models as emerging alternative paradigm for language model development.
Overview of enterprise AI adoption, challenges, and potential for funding accessible AI. Discusses organizational AI implementation but lacks technical depth.
Leadership guide on organizational AI adoption from OpenAI experience with enterprise clients.
Scania deploys ChatGPT Enterprise across global engineering teams to accelerate learning, building, and innovation in industrial operations.
Technical overview of retrieval-augmented generation pipelines used in LLM applications.
OpenAI launches localized initiative in Ireland partnering with government and labs to help SMEs and startups adopt AI tools.