UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation
Continual multi-task training framework for universal audio representation across speech, environmental sounds, and music.
Continual multi-task training framework for universal audio representation across speech, environmental sounds, and music.
Deep learning framework for automated refinement of protein structures using cryo-EM density maps with diffusion models.
Autoregressive decoding model for reconstructing visual information from EEG brain signals using diffusion-based approach.
CeRA improves low-rank adaptation for LLM fine-tuning by adding manifold expansion via gating and dropout, addressing linear limitations.
Combines behavioral and textual relevance signals using LLMs to improve app store search ranking at scale.
First systematic 4-bit quantization-aware training study for attention mechanisms enabling end-to-end FP4 computation on emerging GPUs.
PEPA: embodied AI agent framework with personality-driven persistent autonomy enabling self-sustaining goals without external task specification.
Conformal prediction framework providing finite-sample coverage guarantees for LLM-based medical entity extraction across clinical domains.
Deep learning pipeline for glottal area segmentation in laryngeal videoendoscopy with detection gating for clinical pathology assessment.
Satirical paper presenting pluralistic alignment for LLMs in implausible mulching context; appears to be parody without technical substance.
Architectural model for trustworthy AI-assisted software via human-certified module repositories ensuring reliability of AI-assembled systems.
iGVLM: framework enabling dynamic instruction-guided vision encoding in LVLMs for task-specific visual understanding.
ITO: framework for image-text contrastive learning using multiple alignment and training-time fusion to reduce modality bias.
Interpretability method using attention maps to localize motion concepts in video diffusion transformers.
Theoretical analysis proving Adam optimizer outperforms SGD through second-moment normalization creating sharper gradient tails.
Compositional Probe Decomposition method analyzing how molecular foundation models separate geometric and compositional information.
Conditional normalizing flow method for reconstructing deep brain EEG signals from scalp measurements.
MASS: meta-learning framework enabling LLMs to self-adapt at test time by generating synthetic training data for improved downstream performance.
ZipMap: feed-forward transformer model achieving linear-time 3D reconstruction from multiple images via stateful processing.
Survey of 378 professional visual artists on workplace impacts of generative AI adoption and career concerns.
Neuro-symbolic approach combining LLMs with deterministic fact ledgers and hallucination detection for financial reasoning without arithmetic errors.
Explainability method using counterfactual explanations to understand time-series clustering transitions.
Weakly-supervised method for localizing and describing video events using Gaussian masking and caption augmentation techniques.
Research on merging task-specific models into consolidated ones, analyzing parameter competition and domain generalization effects.
vLLM Hook v0 plugin enabling programmable access to LLM model internals for test-time alignment and inference optimization in vLLM serving engine.
Interpretability study on attention sinks in LLMs, explaining why models allocate disproportionate attention to specific tokens including first token bias.
FuzzingRL approach using reinforcement learning for fuzz testing Vision Language Models to automatically generate failure-inducing queries.
Switchable Activation Networks that dynamically select activation functions for computational efficiency in LLMs and vision-action models during inference.
Khatri-Rao Clustering approach for data summarization using centroid-based clustering with reduced redundancy in prototypes.
Method to align LLM confidence scores with correctness using output token probabilities for reliable error detection and hallucination identification.
Set Transformers with temporal and variable-type attention biases for asynchronous clinical time series in EHR data without imputation.
LegoNet compression technique for neural networks using block weight clustering to reduce memory footprint for embedded device deployment.
Conditional Randomization Test approach for valid feature-level hypothesis testing and p-values in tabular foundation models.
CapTrack benchmark for evaluating multi-faceted forgetting in LLM post-training beyond parametric knowledge loss, addressing domain adaptation challenges.
Research showing majority-voting and ensemble inference methods fail to improve LLM truthfulness without external verification, unlike in math/code domains.
OptiRoulette meta-optimizer that dynamically selects update rules during training via warmup locking and random sampling. Torch-compatible drop-in component achieving 5.3x faster convergence.
Theoretical analysis of diffusion models and flow matching using unified representation via linear equations. Discusses correlation between noisy data and predictions.
Annealed Co-Generation framework for multivariate scientific data generation using progressive pairwise diffusion modeling instead of joint high-dimensional modeling.
RACER system for efficient multi-model LLM routing formulated as risk-aware optimization problem. Extends base routers to minimize cost-performance trade-off.
Novel language model combining autoregressive and diffusion-based generation through latent trajectory modeling with evolving balance parameter.
Framework for zero-shot prediction in multiplex biological networks using topology-aware distillation and adaptation methods.
Framework for token-efficient reinforcement learning in LLMs. Proposes NAT to reduce computational cost of backpropagation over long chain-of-thought trajectories during training.
Research on vulnerabilities in Process Reward Models used in LLM reasoning pipelines. Introduces diagnostic framework to quantify adversarial exploitability and fluency-logic dissociation.
Empirical comparison of ARIMA, LSTM, BiLSTM, and Transformer for short-term power load forecasting.
Survey of Group Relative Policy Optimization for aligning generative models with human preferences.
Graph neural networks for imputing missing pavement condition data in road networks.
Grouter: decoupled routing method for accelerating Mixture-of-Experts training with structural priors.
Leakage-safe graph feature extraction for fraud detection in temporal transaction networks.
Theoretical analysis of polynomial approximation and Heaviside/sigmoid expansions in neural networks.
Graph property inference in small language models: effects of representation on structured reasoning.