Analysis of transformer internals distinguishing recall from reasoning mechanisms through layer-wise attention and activation patterns for interpretability.
Mathematical proof that transformer language models are injective, enabling exact input recovery from representations despite nonlinear components.
Benchmark framework for evaluating neural compression and representation learning on earth observation satellite imagery tasks.
Method for unlearning harmful content from LLMs by analyzing belief redistribution in probability space, avoiding unwanted side effects of gradient ascent.
Theoretical analysis of data scaling laws in linear regression when training multiple epochs on limited datasets, relevant to LLM training efficiency.
Global sensitivity analysis technique for engineering design using individual conditional expectations to improve interpretability of black-box models.
Method to improve LLM consistency and reliability across semantically equivalent prompts using group relative policy optimization for business-critical applications.
Study demonstrating that ensemble diversity across language models mitigates knowledge collapse from training on model-generated outputs.
Theoretical analysis proving structural incompatibility between differentiable sorting operators and rank normalization techniques.
Selective state-space networks on combinatorial complexes for higher-order graph learning using topological deep learning.
Mixture-of-experts approach with heterogeneous experts for capturing multi-scale temporal dynamics in long-horizon time series forecasting.
Integration of Koopman operator theory with transformer architectures for time series forecasting with learnable spectral parameterizations.
Decoding strategy for masked diffusion language models that dynamically adjusts token retention based on context coverage.
Interpretable image classification using hierarchical concept embeddings recovered from vision-language model latent spaces.
Benchmarking framework using roofline analysis to characterize performance of small language models on resource-constrained edge hardware.
Federated learning approach addressing heterogeneous graph structures in distributed GNN training across multiple clients.
Study of many-shot in-context learning as test-time adaptation for LLMs, analyzing benefits and reliability limits with open-source models.
Evaluation framework using proper scoring rules for assessing distributional predictions from tabular foundation models beyond point estimates.
Search procedure to identify optimal learning rate schedule shapes for neural network training across different workloads.
Continual pretraining of LLMs specialized for low-level embedded systems code generation, targeting underrepresented hardware domains.
Algorithm for approximating Gateaux derivatives in causal inference when distributions must be estimated from data.
Statistical methods for estimating sub-Gaussian distribution parameters using intrinsic moment norms in non-asymptotic learning.
Comparative analysis of softmax vs linear attention mechanisms in transformer architectures, examining computational efficiency tradeoffs.
Theoretical framework studying initialization and activation function scaling in neural fields for computer vision signal representation.
Latent diffusion models for geological parameterization and data assimilation, generating realistic geomodels with reduced variables for history matching.
Theoretical analysis of Fisher-Rao gradient flow dynamics under Wasserstein metric, establishing geodesic convexity and functional inequalities.
Nested deep learning foundation model for EEG/MEG spike detection in epilepsy diagnosis, addressing manual identification limitations.
Analyzes brittleness of LLM safety alignment mechanisms, proposing superficial safety alignment hypothesis explaining why standard alignment approaches are vulnerable.
Active causal structure learning framework enabling autonomous robots and AGI agents to dynamically construct causal models of environmental interactions.
Training paradigm integrating masked language modeling with next-token prediction to improve in-context retrieval in large language models.
Spectral filtering framework unifying dataset distillation methods by interpreting them as filters affecting feature correlation eigenvalues.
Dataset of peer review discussions and rebuttals to support automated manuscript evaluation and improve scientific publishing workflow efficiency.
Theoretical analysis of minimax learning rates for binary classification under geometric margin conditions with horizon function decision boundaries.
Prompt-adaptive Best-of-N alignment strategy using reward models to reduce computational cost of test-time alignment for language models.
Survey on integrating TinyML and LargeML for 6G networks, covering deep learning applications in mobile systems, autonomous vehicles, and smart services.
Attention-aware embedding initialization method for new tokens in LLMs without expensive retraining, addressing vocabulary limitations in specialized domains.
Conditional marked point processes for reliable object detection uncertainty quantification, addressing miscalibrated confidence scores in neural networks.
Self-supervised learning approach adapting joint embedding architecture from video to EEG signals for brain activity analysis with limited labeled data.
Quantum-informed ML framework combining quantum generative models with classical predictors for long-term spatiotemporal chaos prediction.
Supervised fine-tuning method to align LLM agents with rational and moral preferences in strategic economic games, addressing systematic behavioral deviations.
Generative approach to bid shading in real-time bidding advertising using non-convex surplus optimization instead of traditional two-stage methods.
Object-centric representations for visual RL policies using dynamic tokens to improve generalization under visual condition changes without fixed-size slots.
Security evaluation of ML model sharing frameworks and hubs, assessing vulnerabilities in loading shared models and security awareness gaps among practitioners.
Neural quantum states impurity solver for quantum embedding problems. Graph transformer-based NQS for solving Hamiltonians in quantum chemistry.
Dynamic Aware: out-of-distribution detection for trajectory prediction in autonomous vehicles. Adaptive multi-mode approach for distribution shift in AVs.
AutoClimDS: agentic AI system for climate data science. Knowledge graph-based workflows for discovering climate patterns from fragmented data sources.
Formal language theory applied to statistical learning. Proves subregular language classes are linearly separable with simple models.
DataMind: scalable data-analytic AI agents for automated discovery. Open-source agent framework handling diverse-format data files and multi-step reasoning.
HoneyBee: data curation approaches for vision-language reasoning datasets. Analyzes impact of context, content, and format on VLM reasoning capabilities.
CBF-RL: integrates control barrier functions into reinforcement learning training. Enforces dynamic safety constraints during RL policy training, not just inference.