MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control
MambaVoiceCloning uses state-space models and diffusion for efficient text-to-speech synthesis without attention layers.
MambaVoiceCloning uses state-space models and diffusion for efficient text-to-speech synthesis without attention layers.
Medical study using voice features to predict health deterioration in chronic heart failure patients via ML models.
Studies grokking in feature learning kernels via Recursive Feature Machine, showing data symmetry breaking is necessary for generalization.
Measure-valued neural network learning McKean-Vlasov dynamics from particle trajectories using cylindrical feature embeddings.
Reverse-engineers gpt-oss-20b tool definitions from in-distribution calls and builds native harmony agent harness with open-source implementation.
Proposes decision-centric framework separating control decisions (answer, retrieve, tool use) from LLM generation in agent architectures.
EgoNav: Humanoid robot navigation system trained on 5 hours of human walking data using diffusion models and frozen DINOv3 backbone.
Shapley-guided approach using derivative-free optimization to repair DNNs affected by backdoors, adversarial attacks, and unfairness.
Theoretical work on reconstructing latent manifold geometry from random geometric graphs beyond volumetric constraints.
Studies policy gradient methods for multi-agent reinforcement learning in partially observable Markov potential games.
Compares U-net and Transformer-based segmentation for detecting multiple sclerosis lesions on 7-tesla MRI.
CheXOne: Vision-language foundation model for chest X-ray interpretation with explicit reasoning about visual evidence.
Introduces Uni-SafeBench, a safety benchmark for unified multimodal large models testing both understanding and generation capabilities.
Extends scenario approach theory for multi-criteria data-driven decision-making with probabilistic robustness guarantees.
Framework for robot imitation learning using multi-camera view scaling to improve generalization from limited expert demonstrations.
Proposes inverse-free sparse variational Gaussian processes using only matrix multiplications for low-precision parallel hardware.
Studies trade-off between pretraining corpus size and retrieval-augmented generation for language models under fixed data budgets.
CircuitProbe predicts reasoning circuits in Transformers from activation statistics in under 5 minutes, achieving 3-4 orders of magnitude speedup over brute-force methods.
Benchmarks State-Space Models (Mamba) against Transformers and BiLSTM for historical newspaper OCR, addressing quadratic complexity limitations.
CEFR-aligned framework with fuzzy C-means for automated assessment of programming skills in Scratch.
Stochastic Attention inspired by connectome topology provides linear-time expressive attention mechanism.
PG-IPRO algorithm for interactive multi-objective route planning with accessibility preferences.
Study shows multimodal LLMs fail at detecting 3D spatial inconsistencies across multiple views.
Deconfounding scores for causal effect estimation preserve treatment-control distinctions in high dimensions.
PARE framework simulates realistic user interactions for evaluating proactive AI agents and assistants.
Divide-and-conquer approach for scalable matrix mechanisms in differential privacy and synthetic data.
StanceMoE uses mixture-of-experts for actor-level stance detection in geopolitical texts.
Dataset and analysis of autonomous coding agent contributions to real-world GitHub projects over time.
Quantum annealing for VAEs with general Boltzmann priors enables structured latent variable interactions.
MyPhoneBench evaluates privacy compliance of mobile phone-use agents completing benign tasks.
Model-based RL controls focal plane wavefront for exoplanet imaging on extremely large telescopes.
Query-conditioned evidential keyframe sampling for efficient multimodal LLM-based long-form video understanding.
ProOOD method for 3D semantic occupancy prediction handles out-of-distribution inputs and long-tailed class bias.
OptoLlama uses masked diffusion models for inverse design of optical multilayer thin films.
MoA-DepthCLIP adapts CLIP vision-language model for monocular depth estimation with parameter-efficient adapters.
PaperRecon framework evaluates quality and hallucination risks in papers generated by AI coding agents.
RL policy adaptation for robotic manipulation under distribution shift using bounded extremum seeking.
NARCBench for detecting multi-agent collusion using multi-agent interpretability on LLM agent activations.
S0 tuning zero-overhead adaptation of hybrid recurrent-attention models outperforming LoRA on code generation.
Function-based uncertainty quantification for safe learning-based control in safety-critical systems.
Learning to generate mixed quantum states prepared by shallow channel circuits in trivial phases.
RELISH lightweight architecture for text regression with LLMs using iterative latent state refinement.
Survey on Graph Neural Network acceleration techniques across algorithms, systems, and customized hardware.
RobustRAG defense framework with certifiable robustness against retrieval corruption attacks on RAG systems.
Inductive manifold learning approach for nonlinear dimensional reduction with local and global structure.
Domain adaptation with distribution shifts and unobserved confounding using linear structural causal models.
Topological Alignment Spectra method for analyzing multi-scale structural relationships in neural network representations.
Gaussian Process interpretation of wide neural networks with observation noise and arbitrary prior means.
Gradient-based hyperparameter learning via evidence lower bound objective from Bayesian variational methods.
Transformer-based decoder for Varshamov-Tenengolts codes correcting insertion, deletion, and substitution errors.