Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models
Advocates integrating causal methods into ML to balance trustworthiness objectives like fairness, privacy, robustness, and explainability.
Advocates integrating causal methods into ML to balance trustworthiness objectives like fairness, privacy, robustness, and explainability.
Proposes Dual Filter framework connecting Hidden Markov Models to transformer decoder architecture for causal nonlinear prediction.
Applies unsupervised anomaly detection to ultrafast electron diffraction data to identify beam instabilities in materials science experiments.
Introduces Guided Policy Optimization framework for RL in partially observable environments using privileged information from simulators.
Analyzes oversmoothing problem in deep Graph Neural Networks and explores why networks fail to learn non-oversmoothed representations.
Proposes uncertainty estimation improvements to Residual Reinforcement Learning for faster adaptation of pretrained policies with sparse rewards.
Data condensation approach for training diffusion models with minimal computational budget by constructing smaller synthetic training datasets.
Graph transformer architecture designed for invariant learning to improve out-of-distribution generalization on graph-structured data.
Theoretical study of implicit bias in deep neural network training showing gradient flow induces learning of lower-dimensional parameter structures.
Continual learning framework with unified prompt pools for medical imaging tasks, addressing domain-specific challenges in adaptive AI.
Analysis of compositional generalization mechanisms in conditional diffusion models, studying length generalization on controlled image generation tasks.
Low-rank approximation technique for accelerating machine learning models predicting mechanical properties of heterogeneous materials.
Lightweight meta-learning method using three parameters to dynamically adjust sample loss weights for noisy training, fairness, and synthetic data utilization.
Hybrid pre-training approach using low-rank adapters alongside full training to reduce computational cost for vision transformer training.
Graph-based method for forecasting irregular multivariate time series in healthcare and finance with adaptive spatio-temporal interactions.
Method for robust fine-tuning non-robust pretrained models using epsilon-scheduling to achieve adversarial robustness and task adaptation simultaneously.
Neural operator architecture combining spectral and coupling methods for efficiently learning partial differential equation dynamics.
Analysis of transformer internals distinguishing recall from reasoning mechanisms through layer-wise attention and activation patterns for interpretability.
Mathematical proof that transformer language models are injective, enabling exact input recovery from representations despite nonlinear components.
Benchmark framework for evaluating neural compression and representation learning on earth observation satellite imagery tasks.
Method for unlearning harmful content from LLMs by analyzing belief redistribution in probability space, avoiding unwanted side effects of gradient ascent.
Theoretical analysis of data scaling laws in linear regression when training multiple epochs on limited datasets, relevant to LLM training efficiency.
Global sensitivity analysis technique for engineering design using individual conditional expectations to improve interpretability of black-box models.
Method to improve LLM consistency and reliability across semantically equivalent prompts using group relative policy optimization for business-critical applications.
Study demonstrating that ensemble diversity across language models mitigates knowledge collapse from training on model-generated outputs.
Theoretical analysis proving structural incompatibility between differentiable sorting operators and rank normalization techniques.
Selective state-space networks on combinatorial complexes for higher-order graph learning using topological deep learning.
Mixture-of-experts approach with heterogeneous experts for capturing multi-scale temporal dynamics in long-horizon time series forecasting.
Integration of Koopman operator theory with transformer architectures for time series forecasting with learnable spectral parameterizations.
Decoding strategy for masked diffusion language models that dynamically adjusts token retention based on context coverage.
Interpretable image classification using hierarchical concept embeddings recovered from vision-language model latent spaces.
Benchmarking framework using roofline analysis to characterize performance of small language models on resource-constrained edge hardware.
Federated learning approach addressing heterogeneous graph structures in distributed GNN training across multiple clients.
Study of many-shot in-context learning as test-time adaptation for LLMs, analyzing benefits and reliability limits with open-source models.
Evaluation framework using proper scoring rules for assessing distributional predictions from tabular foundation models beyond point estimates.
Search procedure to identify optimal learning rate schedule shapes for neural network training across different workloads.
Continual pretraining of LLMs specialized for low-level embedded systems code generation, targeting underrepresented hardware domains.
Algorithm for approximating Gateaux derivatives in causal inference when distributions must be estimated from data.
Statistical methods for estimating sub-Gaussian distribution parameters using intrinsic moment norms in non-asymptotic learning.
Comparative analysis of softmax vs linear attention mechanisms in transformer architectures, examining computational efficiency tradeoffs.
Theoretical framework studying initialization and activation function scaling in neural fields for computer vision signal representation.
Latent diffusion models for geological parameterization and data assimilation, generating realistic geomodels with reduced variables for history matching.
Theoretical analysis of Fisher-Rao gradient flow dynamics under Wasserstein metric, establishing geodesic convexity and functional inequalities.
Nested deep learning foundation model for EEG/MEG spike detection in epilepsy diagnosis, addressing manual identification limitations.
Analyzes brittleness of LLM safety alignment mechanisms, proposing superficial safety alignment hypothesis explaining why standard alignment approaches are vulnerable.
Active causal structure learning framework enabling autonomous robots and AGI agents to dynamically construct causal models of environmental interactions.
Training paradigm integrating masked language modeling with next-token prediction to improve in-context retrieval in large language models.
Spectral filtering framework unifying dataset distillation methods by interpreting them as filters affecting feature correlation eigenvalues.
Dataset of peer review discussions and rebuttals to support automated manuscript evaluation and improve scientific publishing workflow efficiency.
Theoretical analysis of minimax learning rates for binary classification under geometric margin conditions with horizon function decision boundaries.