Exploring Silent Data Corruption as a Reliability Challenge in LLM Training
Analysis of silent data corruption during LLM training on hardware, studying gradient corruption impacts and detection mechanisms.
Analysis of silent data corruption during LLM training on hardware, studying gradient corruption impacts and detection mechanisms.
Spectral Compact Training method reduces LLM training memory footprint by replacing dense weight matrices with truncated SVD factors.
Transformer-based model with biomarkers for immunotherapy response prediction, improving generalization across diverse cancer datasets.
Open-ended narrative framework for wearable human activity recognition using compositional, unscripted activities instead of closed-set classification.
ThoughtSteer backdoor attack exploiting continuous reasoning in language models that operate silently in hidden states without token output.
Method to reduce neural network multi-class classification complexity from O(n) to O(1) by leveraging known latent space geometry properties.
Optimus training library for pretraining mixture-of-experts LLMs at exascale on Aurora supercomputer, demonstrating 1000s GPU tile scaling.
Deep learning method for plant phenology prediction using domain adaptation to improve climate change forecasting in ecological systems.
Experimental evaluation of Free-Market Algorithm orchestrated Mixture-of-Experts with cost-penalized fitness for domain adaptation.
Optimal decomposition technique for low-rank approximation of LLM weights enabling efficient fine-tuning and inference.
Method for language agents to optimize test-time adaptation policies through iterative refinement during inference.
Reinforcement learning approach with verification for iteratively improving LLM policies based on actual performance gains.
Framework for human-AI cooperation that models fatigue-induced performance degradation in learning-to-defer systems.
Compositional embedding method for protein networks using additive sequence models on biological interaction data.
Orthogonal learning approach for estimating heterogeneous long-term treatment effects combining experiments and observational data.
Method for verifiable repair of transformer vulnerabilities to adversarial perturbations with inner-layer guarantees.
Flow-based reinforcement learning policy with distributional approach for capturing multimodal solutions in trajectory optimization.
Graph partitioning technique using embeddings to enable scalable distributed training of graph neural networks.
Transfer learning methodologies for Bayesian network structure learning with scarce data.
Model-based learning approach for finite-window policies in partially observable Markov decision processes.
Method for efficiently evaluating LLM downstream performance during training without expensive full inference.
Algorithmic approach to multi-objective optimization via hashing and randomization for identifying Pareto frontiers.
Theoretical analysis of dependency networks using information geometry perspective for modeling complex systems.
Data-driven sports training framework using skeleton-based biomechanical analysis and motion modeling for dart throwing.
AI pipeline extracting building elevation data from street-view imagery with ML imputation for flood risk assessment.
Analysis showing how irrelevant context degrades LLM reasoning performance despite test-time scaling capabilities.
Generative model approach using adversarial distribution alignment to bridge simulation-to-experiment gap in scientific domains.
ORCA framework calibrating LLM sampling through conformal prediction to improve test-time reasoning efficiency and generalization.
Physics-informed neural network combining diffusion-advection with evidential fusion for air quality forecasting.
Multiscreen mechanism for language models enabling absolute query-key relevance assessment beyond relative attention redistribution.
CliffSearch agent framework for scientific algorithm discovery combining LLM-guided search with structured evolution of theory and code.
Mathematical framework analyzing what determines forecast skill in AI weather prediction, emphasizing training methodology over architecture.
LAPIS-SHRED method for reconstructing spatio-temporal dynamics from sparse observations using shallow recurrent decoders.
PhoneticXEUS model for robust multilingual phone recognition trained on large-scale data with pretrained representations.
LLM-based recruitment tool identifying requisition-specific competencies through dynamic few-shot prompting and reflection.
Text-based harmonization approach using LLMs to unify multi-institutional EHR data without explicit schema standardization.
Modular RL framework with decomposable reward modeling and realistic environment design for Forex trading applications.
Mathematical analysis establishing isomorphism between ant colony behavior and ensemble learning methods like boosting.
LLM-based approach to identify enterprise architecture debt indicators from unstructured documentation in organizations.
Framework combining vision language models with RL for dense reward generation in long-horizon robotic tasks to reduce manual reward engineering.
GenoBERT uses transformers for reference-free genotype imputation without ancestry bias.
Theoretical analysis of scaled gradient descent for low-rank matrix recovery with optimal sampling complexity.
Analysis of transformer forecast collapse under squared loss for financial time series with weak conditional structure.
Genetic algorithms for multi-objective feature selection in cancer biomarker discovery from omics data.
HIVE framework for hierarchical pre-training of vision encoders integrated with large language models for vision-language alignment.
Transformer-based models for detecting software vulnerabilities in C/C++ using program slices.
Finite-time convergence analysis of two-time-scale stochastic approximation algorithms.
Performance regression detection for CI systems using risk-aware batch testing and predictive models.
Energy-based models for stable system identification in physical AI using port-Hamiltonian dynamics.
Data-driven reachability analysis for dynamical systems using diffusion models with PAC theoretical guarantees.