ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
ReactDance: Diffusion-based method for generating reactive dance conditioned on lead dancer motion with fine-grained spatial interactions.
ReactDance: Diffusion-based method for generating reactive dance conditioned on lead dancer motion with fine-grained spatial interactions.
Benchmark evaluating LLMs on rigorous causal inference tasks involving statistical pitfalls relevant to medicine, economics, and public policy.
ShIOEnv: Gymnasium environment for grammar-constrained synthesis and modeling of command-line interface behavior with shell input-output data.
Novel framework using multi-kernel Boolean parameters for weight binarization in LLMs to improve efficiency without full-precision latent weights.
SealQA benchmark for evaluating search-augmented LLMs on fact-seeking questions with conflicting or noisy web search results.
EDINET-Bench: New benchmark dataset evaluating LLMs on complex financial document analysis tasks using Japanese financial statements.
Physics-informed deep learning using geometric priors like Hamiltonian symmetries for structure-preserving neural network models in robotic systems.
MuRating framework transfers English data-quality signals to score documents in 17 languages for multilingual LLM pretraining.
Research generalizes EDM to arbitrary-noise diffusion models, analyzing design space beyond Gaussian noise for image restoration tasks.
Quantum EM algorithm for training quantum Boltzmann machines that circumvents barren plateau problem in quantum machine learning.
Study shows LLM ranking systems are sensitive to small changes in preference data, with top model rankings changeable by dropping few preferences.
Comprehensive evaluation of how weight and activation quantization affects model bias across stereotypes, fairness, toxicity, and sentiment.
Research on optimal alignment of acoustic and linguistic representations in pre-trained models for automatic speech recognition.
BabyHuBERT self-supervised speech model trained on 13,000 hours multilingual child-centered recordings for speaker segmentation.
Framework combining diffusion models with impedance control for robot learning in contact-rich manipulation tasks.
Noise-to-Notes reformulates automatic drum transcription as conditional generative task using diffusion modeling.
BridgeDrive applies diffusion-based planning with expert behavior anchors for closed-loop autonomous driving trajectory planning.
BeyondBench framework uses algorithmic problem generation for contamination-resistant evaluation of reasoning in language models.
SphereAR addresses variance collapse in continuous-token autoregressive image generation by constraining latents to hypersphere.
Theoretical analysis of quantitative convergence of shallow neural networks trained via gradient descent to Gaussian processes.
Research on NVFP4 quantization approach for efficient LLM pretraining, reducing compute and energy requirements for frontier models.
VidGuard-R1 uses reasoning MLLMs and reinforcement learning to detect AI-generated videos with human-interpretable explanations.
Research on self-supervised novel view synthesis identifies transferability as key criterion for true NVS capability across video sequences.
Safety filtering for reinforcement learning using Control Barrier Functions to enforce dynamic safety constraints during training.
Bayesian inference approach for inverse problems using discrete loss optimization with applications in data assimilation and imaging.
Schrodinger Bridge Mamba model for one-step speech enhancement combining diffusion-based training with Mamba architecture.
Theoretical analysis of how data geometry controls generalization in overparameterized two-layer ReLU networks during training.
Framework for detecting when influential data subsets have excessive impact on model conclusions beyond random variation.
Multimodal foundation model for accelerating numerical simulation of stochastic differential equations via neural network-based error correction.
MotionStream: Real-time streaming video generation with motion controls achieving 29 FPS on single GPU.
CytoNet: Foundation model for analyzing cellular architecture in human cerebral cortex from histological images.
CoRPO: Adds correctness bias to GRPO reinforcement learning for improved reasoning and generalization in LLMs.
ObAct: Imitation learning framework for active vision in dual-arm robots using 3D Gaussian Splatting.
Machine learning framework for organic photovoltaic material discovery using dual-pronged donor-acceptor modeling.
Adversarially guided diffusion sampling method with control energy regularization for improved sample quality.
GRAND: Multi-agent path finding system using reinforcement learning for robot fleet task scheduling and dispatch.
Phase-Preserving Diffusion model for structure-aligned generation in image-to-image translation tasks.
Uncertainty quantification method for aggregating multiple predictive models in conformal prediction framework.
ReFusion: Masked diffusion LLM combining parallel inference with autoregressive KV caching for faster generation.
AMPEND-LS: Agentic multi-persona LLM framework for multimodal fake news detection with evidence grounding.
Parallel Token Prediction: Framework for generating multiple LLM tokens in single forward pass via deterministic functions.
Machine learning classification of cellular malignancy using electrical impedance bioelectric signatures.
EmboTeam: LLM-based multi-robot task planning framework using behavior trees and PDDL for embodied AI.
AI agent system for continuous long-horizon egocentric video understanding from wearable devices.
Theoretical convergence analysis of Muon optimizer for nonconvex optimization problems.
Bayesian inference sampling method for inverse problems with approximate operators using Metropolis-Hastings.
LatentChem: LLM framework for chemical reasoning using latent representations instead of chain-of-thought text.
Theoretical study of online conformal prediction under distribution drift for non-stationary data streams.
Privacy-preserving person re-identification system using transformer architecture for decentralized surveillance across urban cameras.
Benchmarking system for multimodal respiratory audio question answering evaluating robustness under realistic heterogeneity.