GENIUS: Generative Fluid Intelligence Evaluation Suite
GENIUS: Generative Fluid Intelligence Evaluation Suite
GENIUS: Generative Fluid Intelligence Evaluation Suite
Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling
Implicit Probabilistic Reasoning Does Not Reflect Explicit Answers in Large Language Models
Metareasoning in uncertain environments: a meta-BAMDP framework
Bridging Explainability and Embeddings: BEE Aware of Spuriousness
Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation
Is Your LLM Really Mastering the Concept? A Multi-Agent Benchmark
Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs
PhysUniBench: A Multi-Modal Physics Reasoning Benchmark at Undergraduate Level
Scaling Towards the Information Boundary of Instruction Sets: The Infinity Instruct Subject Technical Report
Synthetic Homes: An Accessible Multimodal Pipeline for Producing Residential Building Data with Generative AI
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
AI Driven Discovery of Bio Ecological Mediation in Cascading Heatwave Risks
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
Measuring What Matters: The AI Pluralism Index
Unifying Deductive and Abductive Reasoning in Knowledge Graphs with Masked Diffusion Model
Retrieval- and Argumentation-Enhanced Multi-Agent LLMs for Judgmental Forecasting (Extended Version with Supplementary Material)
PreferThinker: Reasoning-based Personalized Image Preference Assessment
CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents
The Specification Trap: Why Content-Based AI Value Alignment Cannot Produce Robust Alignment
Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale
Active Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms
Meta Context Engineering via Agentic Skill Evolution
World of Workflows: A Benchmark for Bringing World Models to Enterprise Systems
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics
Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility
Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink
PieArena: Frontier Language Agents Achieve MBA-Level Negotiation Performance and Reveal Novel Behavioral Differences
Progress Constraints for Reinforcement Learning in Behavior Trees
EventCast: Hybrid Demand Forecasting in E-Commerce with LLM-Based Event Knowledge
MePo: Meta Post-Refinement for Rehearsal-Free General Continual Learning
From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent
Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning
Why do we Trust Chatbots? From Normative Principles to Behavioral Drivers
Learning the Value Systems of Societies with Preference-based Multi-objective Reinforcement Learning
SpotAgent: Grounding Visual Geo-localization in Large Vision-Language Models through Agentic Reasoning
ClinAlign: Scaling Healthcare Alignment from Clinician Preference
CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
Structured Sentiment Analysis as Transition-based Dependency Graph Parsing
Games with Payments between Learning Agents
Tensor learning with orthogonal, Lorentz, and symplectic symmetries
Towards Better Code Understanding in Decoder-Only Models with Contrastive Learning
Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models
ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data
Enhancing Inverse Reinforcement Learning through Encoding Dynamic Information in Reward Shaping
When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs
Symmetrization Weighted Binary Cross-Entropy: Modeling Perceptual Asymmetry for Human-Consistent Neural Edge Detection
Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs