RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks
RoboPARA is an LLM-driven framework for dual-arm robot task planning that optimizes parallelism across tasks using large language models.
RoboPARA is an LLM-driven framework for dual-arm robot task planning that optimizes parallelism across tasks using large language models.
BemaGANv2 is a GAN-based vocoder for high-fidelity long-term audio generation in text-to-music and text-to-audio systems, evaluating discriminator combination strategies.
Co-LoRA federated learning framework for personalizing heterogeneous multi-modal models across clients without privacy risks.
LLM-based 3D scene planner that relaxes goals with commonsense reasoning to generate feasible actions in complex environments.
Semi-self-supervised learning approach for instance segmentation reducing annotation requirements for densely-packed objects.
Adaptive batch-wise sample scheduling for Direct Preference Optimization of LLMs accounting for model state evolution during training.
Motivation-enhanced reinforcement learning framework for efficient reasoning model finetuning with verifiable rewards on complex tasks.
Zero-shot unified image restoration using latent diffusion recurrent posterior sampling without paired training data.
Analysis of Physics-Informed Neural Networks under noisy data, establishing conditions for low empirical risk on Hamilton-Jacobi-Bellman equations.
Mamba Snake framework using state space modeling for unified multi-scale medical image segmentation across organs.
Vision Transformer-based framework for post-disaster affected area segmentation from satellite imagery with confidence indexing.
Survey on flow matching generative models applied to biological discovery including protein design, molecule generation, and drug discovery.
User Goal Alignment framework addressing LLM-based user simulators' inability to maintain goal-oriented behavior in multi-turn conversations.
CauKer algorithm for pre-training time series foundation models using causally-generated synthetic data for sample efficiency.
Graph foundation models trained on graph properties for improved cross-domain generalization in graph classification tasks.
Video-LLM framework using event-centric episodic memory to handle long-form video understanding beyond context window limits.
Foundation model for industrial sensor signals with frequency-aware hierarchical encoding supporting arbitrary sampling rates.
Entropy-driven curriculum learning approach for multi-task human mobility prediction from mobile device data.
Optimal transport-enhanced graph networks for aspect-based sentiment analysis using syntactic-semantic structures.
Multi-view diffusion policy for coordinated mobile manipulation control with manipulability awareness in unstructured environments.
Robotic skill composition using scene graphs for generalist robots to solve complex tasks with distribution shift robustness.
Single-image implicit surface reconstruction for robotics obstacle avoidance and motion generation.
Surrogate-free multi-agent reinforcement learning framework using generative models instead of explicit policy populations.
Active learning method for correlation clustering in cold-start settings without initial pairwise similarity data.
Transformer architecture using cross-state transition attention for robust robotic manipulation from demonstrations.
Prompting protocol combining objection-raising and revision mechanisms to improve LLM reasoning and self-correction.
Multi-turn red-teaming approach using tree-based dialogue and reinforcement learning for discovering LLM vulnerabilities.
Scalable methods for computing Wasserstein barycenters of probability measures via gradient flows.
Hardware-software co-design framework for efficient multimodal model inference on battery-powered edge devices.
Membership inference attacks on LLM tokenizers as privacy attack surface distinct from model attacks.
Backdoor attack on vision-language-action models demonstrating action-level behavioral manipulation vulnerabilities.
World model and MPC framework for humanoid robot contact planning combining learned representations with sampling-based control.
Open-source corpus and tools for training fully open multimodal LLMs with improved data quality and reasoning.
Study on unintended reasoning behaviors in reinforcement-learning-trained LLMs and chain-of-thought monitoring.
Continual learning method for audio-visual segmentation addressing modality entanglement in sequential tasks.
Framework enabling LLMs to perform tabular prediction via structural priors and reasoning-focused optimization.
Evaluates driving world models as synthetic data generators for autonomous vehicle perception tasks.
Transformer framework for class-agnostic object counting using visual repetition patterns.
Navigation system using 3D Gaussian Splatting memory for multi-modal visual goal navigation in robotics.
SwiftEmbed: production text embedding system achieving 1.12ms latency and 50k req/s using static token lookup in Rust.
Research on vectorized online POMDP planning for autonomous robot decision-making under partial observability with parallelization.
Research on detecting AI-generated images via diffusion model snap-back reconstruction forensics. Addresses Stable Diffusion and DALL-E detection.
Comparative study of interpretable fuzzy reasoning vs deep learning for motor-imagery EEG classification in brain-computer interfaces.
Research paper on federated learning of mixture-of-experts models for mobile edge computing and resource-constrained devices.
FATE benchmark series for formal algebra theorem proving at multiple difficulty levels. Evaluates LLM capabilities on mathematical reasoning beyond contest problems.
Detection method for AI-generated images using contextual anomaly estimation in masked autoencoders. Extends DetectGPT approach from text to vision domain.
HatePrototypes: Interpretable representations for hate speech detection covering implicit and explicit hate. Addresses content moderation with transferable embeddings.
UnfoldLDM combines deep unfolding networks with latent diffusion models for blind image restoration. Model-based interpretable approach to image processing.
Probabilistic certification framework improving SmoothLLM defense against LLM jailbreaking attacks. Addresses robustness guarantees with realistic assumptions.
Yo'City: Agentic framework using self-critic expansion for personalized, boundless 3D city generation. Demonstrates AI agent reasoning in creative generation tasks.