Singing Syllabi with Virtual Avatars: Enhancing Student Engagement Through AI-Generated Music and Digital Embodiment
Educational approach using AI-generated singing and virtual avatars to present course syllabi for improved student engagement.
Educational approach using AI-generated singing and virtual avatars to present course syllabi for improved student engagement.
TaoSR1 deploys LLMs directly for e-commerce query-product relevance prediction using chain-of-thought reasoning with error mitigation.
Framework for adaptive chain-of-thought compression in LLMs reduces computational costs while maintaining reasoning quality on software engineering tasks.
VSSFlow unified flow-matching framework for both video-to-sound and visual text-to-speech generation tasks.
VoiceBridge one-step latent bridge model for general speech restoration from diverse distortions at 48 kHz.
v-HUB benchmark for evaluating multimodal LLMs on humor understanding using non-verbal short videos.
Latent Speech-Text Transformer improves compute efficiency of auto-regressive speech-text models through latent representation compression.
NavSpace benchmark with 1,228 trajectory-instruction pairs evaluates spatial reasoning and perception capabilities of embodied navigation agents.
RECODE framework uses code generation and derendering for visual question answering on structured visuals like charts and diagrams.
REAP demonstrates expert pruning outperforms expert merging for compressing Mixture-of-Experts models on generative tasks.
RL-100 framework combines diffusion visuomotor policies with reinforcement learning for real-world robotic manipulation tasks using clipped PPO.
Reasoning framework using LLMs with permutation relative policy optimization for interpretable tabular prediction with structural priors.
Vision-language-action model (FALCON) incorporating 3D spatial foundation priors for improved grounding and generalization in real-world robotic tasks.
Framework for synthesizing hand manipulation sequences with language instructions using discrete human-object interaction representations.
Vectorized parallel algorithm for POMDP planning under partial observability for autonomous robots leveraging modern hardware parallelization.
Graph domain-incremental learning method for updating models across multiple graph domains using knowledge disentanglement and preservation.
Structured matrix scaling approach for post-hoc multi-class classifier calibration beyond standard temperature scaling.
Data valuation method for time series foundation models using in-context fine-tuning to efficiently assess training data quality.
Multi-round entity-level reasoning segmentation task for medical images using text prompts, enabling iterative dialogue-based medical image analysis.
Machine learning method using time-series foundation models with in-context learning for bearing-health classification without fine-tuning.
LLM-based chatbot for automated generation and solving of electromagnetic simulation models.
VLM-based method for human-object interaction detection addressing long-tail bias with adaptive diversity.
Periodic asynchrony training approach for accelerating LLM reinforcement learning by decoupling inference and training.
Study of universal adversarial patch attacks on vision-language-action models controlling robots.
ELERAG enhances RAG systems with entity linking for improved factual accuracy in specialized educational domains.
Diffusion-based framework for forecasting electromagnetic field levels in wireless networks.
Geometry-aware indexing method for billion-scale approximate nearest neighbor search on disk-resident vectors.
CRANE analyzes language-specific neurons in multilingual LLMs using causal relevance methods for interpretability.
Bayesian generative modeling framework enabling flexible conditional inference on arbitrary variable partitions.
Multi-sequence ophthalmic angiography classification using state-space models for medical image analysis.
Automated system for generating and resolving diverse forecasting questions for AI evaluation and benchmarking.
Vision-language model for automated web accessibility violation detection and HTML correction.
Deep learning method for monocular surface normal estimation from single RGB images.
Infusion framework uses influence functions to edit training data and induce targeted model behavior changes.
Energy-efficient continual learning method for spiking neural networks on neuromorphic vision systems.
B-DENSE improves diffusion model distillation by using dense trajectory supervision instead of sparse steps.
Deep reinforcement learning approach for robust control of mechanical systems handling multiple sources of uncertainty.
Diffractive optical neural processor with reconfigurable nonlinearity for energy-efficient optical domain processing.
Research on diffusion language models addressing the factorization barrier to enable efficient parallel token generation.
OrthoAI combines 3D tooth segmentation with biomechanical reasoning for clear aligner orthodontics using sparse-supervision learning.
Dual-pipeline bird image segmentation framework combining Grounding DINO 1.5, YOLOv11, and SAM 2.1 for zero-shot and supervised segmentation.
Pri4R approach equipping Vision-Language-Action models with implicit world dynamics understanding through privileged 4D representation learning.
GOME: MLE agent framework replacing tree search with gradient-based optimization for machine learning engineering tasks using LLM reasoning.
Coordinated Boltzmann MCTS for decentralized multi-agent planning, replacing deterministic UCT with stochastic Boltzmann policy for sparse reward environments.
Communication mechanism (RMHA) for decentralized multi-robot path planning using attention over Manhattan distances for spatial-aware coordination.
Training-free method (PlaneCycle) for lifting 2D foundation models to 3D volumetric data without adapters or retraining via cyclic spatial aggregation.
Study of grokking phenomenon through architectural modifications, identifying topology-based degrees of freedom that influence memorization vs. generalization phases.
Analysis of performative chain-of-thought in reasoning models, showing hidden beliefs diverge from generated reasoning tokens at task-specific difficulty levels.
Methods for eliciting truthful outputs from censored LLMs using honesty elicitation and lie detection, tested on models trained to conceal information.
OptiRoulette meta-optimizer that dynamically selects update rules during training for faster convergence, packaged as PyTorch-compatible component.