Can Vision-Language Models See Squares? Text-Recognition Mediates Spatial Reasoning Across Three Model Families
Study examining spatial reasoning limitations in vision-language models when localizing cells in binary grids without textual cues.
Study examining spatial reasoning limitations in vision-language models when localizing cells in binary grids without textual cues.
MadEvolve framework uses LLMs to discover and optimize scientific algorithms for cosmology problems via iterative code modification and parameter tuning.
ReLoop addresses silent failures in LLM-based optimization code through structured generation and behavioral verification, closing feasibility-correctness gaps up to 90%.
Vertical federated learning using saddle point reformulation with deterministic and stochastic methods for distributed training across devices.
Review of machine learning and physics-informed methods for coronary fractional flow reserve assessment from imaging.
Benchmark evaluating 50+ audio embedding models across 30 tasks in speech, music, and audio-text reasoning in 100+ languages.
Construct-and-Refine method for efficient constraint handling in neural solvers for complex routing problems.
Quantum graph convolutional networks for unsupervised learning on NISQ hardware with reduced circuit depth and qubit requirements.
Language-guided program optimization using LLMs for automated heuristic design in combinatorial optimization problems.
CLAA cross-layer attention aggregation technique to accelerate LLM prefill stage using token ranking.
Partial identification method for estimating population quantities from missing-not-at-random feedback using pretrained model embeddings.
Implicit cooperation MARL approach enabling decentralized agent coordination in local energy markets without direct communication.
MARLEM open-source multi-agent RL simulation framework for studying implicit cooperation in decentralized energy markets.
Systematic evaluation of LLM long-context reasoning limits in automated bug fixing using SWE-bench Verified.
LGQ discretization geometry learning approach for scalable and stable image tokenization in visual generation.
Machine-learning weather emulators to study fast radiative climate feedbacks on weekly timescales.
Evolutionary Context Search enables LLMs to acquire new knowledge post-deployment via improved retrieval-augmented generation.
Multifaceted learnable index approach for ANN-based retrieval in large-scale recommendation systems.
ECDF clustering method for analyzing quality and distributional characteristics of LLM-based agent system responses.
CHAI uses cross-inference caching to accelerate text-to-video diffusion model inference while maintaining quality.
Density estimation study for mixture models using KL divergence with finite dictionaries of densities.
Missing-by-Design framework for revocable multimodal sentiment analysis with privacy-preserving parameter modification.
FlexATC distributed optimization framework for nonsmooth problems with communication efficiency over networks.
EC-Net hyperbolic hypergraph framework for multimodal emotion and sentiment analysis using contrastive learning.
PAHF framework for training AI agents to learn and adapt to individual user preferences through continuous human feedback.
Proposes conjugate learning theory to characterize trainability and generalization in deep neural networks using convex duality.
EnterpriseGym Corecraft: high-fidelity RL environment simulating customer support with 2,500+ entities and 23 tools for training generalizable agents.
Multi-agent bandit framework for submodular welfare problem maximizing agent utilities under bandit feedback conditions.
Design concepts for memory systems to support artificial superintelligence, exploring extraction and storage paradigms without novel methods.
Distributed training method for quantum neural networks using circuit cutting to decompose large circuits into smaller subcircuits.
Quantum data encoding framework using tensor networks to reduce circuit depth and resource requirements for quantum machine learning.
Convex gated probing method for faithfully evaluating audio SSL embeddings, improving transformer ranking on AudioSet benchmark.
Lightweight hierarchical transformer for efficient 3D medical image segmentation balancing accuracy with computational efficiency.
Parameter-efficient implicit neural representation architecture using learnable periodic activations inspired by subtractive synthesis.
Study on using neural audio codecs for audio deepfake detection with analysis of resynthesized waveform labeling methods.
STING benchmark for measuring multi-turn, multilingual LLM agent misuse over sequential steps with automated red-teaming.
Methodological overview of applying machine learning in epidemiology covering supervised/unsupervised learning principles and applications.
Acoustic maps spatial feature representation for detecting replay attacks in speaker verification using multi-channel recordings.
CAFE framework combining causal discovery with multi-agent reinforcement learning for automated feature engineering on tabular data.
RoboGene framework using diversity-driven agentic task generation to maximize robotic manipulation training data for VLA pre-training.
Framework for learning consumer preferences from partial ranking data using logistic choice probabilities and low-rank user-item factors.
Mechanistic analysis showing looped and depth-grown LLM architectures exhibit convergent depth-wise signatures, unifying reasoning approaches.
Online conformal prediction method for non-stationary data streams with unknown distribution drift using training-conditional cumulative regret.
Zero-shot classifier editing method enabling fine-grained video understanding by splitting coarse categories without retraining on new annotations.
Analysis of independent policy-gradient learning in N-player linear-quadratic stochastic differential games with global convergence guarantees.
Contrastive learning framework with attention-based feature adaptation for street-view image classification using vision-language models like CLIP.
Theoretical analysis of error propagation when recursively training diffusion models on synthetic data, showing performance degradation from distribution drift.
Novel explainability method for transformer models using context-aware layer-wise integrated gradients to interpret predictions by capturing inter-token dependencies.
Research on using LLMs as comparative evaluators with reliability weighting. Analyzes bias and inconsistency in LLM judgment aggregation.
Diffusion models for rare-event sampling in molecular dynamics. Scientific computing application with limited general AI interest.