ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference
ScoutAttention optimizes LLM inference by pre-computing KV cache on CPU ahead of GPU execution to reduce memory constraints.
ScoutAttention optimizes LLM inference by pre-computing KV cache on CPU ahead of GPU execution to reduce memory constraints.
Preconditioned attention mechanism addressing ill-conditioning in Transformer attention blocks for efficient training.
GSR-GNN framework for efficient training of deep graph neural networks on large circuit graphs with memory optimization.
Online optimization framework for learning Kalman filtering in partially observed linear dynamical systems with unknown models.
CDFormer hybrid deep learning model for lithium-ion battery remaining useful life prediction with temporal data augmentation.
OMD-Bench benchmark for testing multimodal model robustness by systematically breaking modality consensus.
Semantic Router DSL for declarative LLM inference routing with content signal analysis, privacy policies, and audit traces.
GibbsPCDSolver for scalable maximum entropy population synthesis from census data using persistent contrastive divergence.
Spectrogram-Enhanced Multimodal Fusion using Vision Transformers for multivariate time series commodity price forecasting.
JSON-LD metadata framework for embedding provenance information in computer vision datasets.
Active learning approach for tabular foundation models using in-context learning to reduce cold-start labeling costs.
Diagnostic method for detecting non-Markovian observation violations in reinforcement learning using prediction-based scoring.
K-Means anomaly detection for microcontrollers with distributed model-sharing workflow via Distributed Internet of Learning.
Conditional Factuality Control framework for LLM hallucination control via conformal sampling with conditional coverage guarantees.
LatentBiopsy: training-free method detecting harmful prompts by analyzing residual-stream activation geometry in LLMs.
K-Means clustering algorithm with Kempe swaps for semi-supervised constrained clustering under must-link and cannot-link constraints.
Theoretical analysis of LayerNorm vs RMSNorm geometric constraints and their effect on neural network Bayesian complexity.
Image-to-CAD program synthesis using geometric feedback for bootstrapping alignment between visual and symbolic representations.
Modular framework and taxonomy for reinforcement learning with diffusion and flow models as policy representations.
KV cache compression via uniform angle quantization in Fast Walsh-Hadamard domain with per-layer precision allocation.
Empirical study of 33 KV cache quantization methods for self-forcing video generation with memory optimization.
Mixture of Experts with drift-aware token routing for continual instruction tuning of large vision language models.
Variational learning approach for estimating fractional posteriors with applications to probabilistic modeling.
Causal mediation analysis framework using Pearl's methods to decompose discrimination in AI-driven credit decisions.
Self-imitating proximal policy optimization algorithm improving exploration efficiency in sparse reward reinforcement learning.
Binary latent space protein fitness optimization using QUBO-based methods and pretrained protein language models.
Model-centric visualization framework using spatial and temporal listeners to analyze ML model behaviors.
Graph neural network method addressing oversquashing via cross-attentive cohesive subgraph embedding for long-range information.
Federated learning framework for multimodal data with heterogeneous clients and missing modalities using block-wise approach.
Spiking neural network architecture for energy-efficient predictive insulin delivery on wearable devices.
Theoretical analysis of self-supervised pre-training using two-stage M-estimation and representation symmetry to improve bounds.
Federated soft-prompts framework for continual web personalization with privacy preservation and stability-plasticity control.
Heterogeneous graph representation learning foundation model for cross-domain transfer without textual attributes or domain-specific schemas.
Reinforcement learning agents (DQN, SARSA, A2C/A3C) for automated quiz composition with topic coverage and difficulty optimization.
Empirical study of Low-Rank Adaptation (LoRA) in sequential fine-tuning of transformer encoders, analyzing catastrophic forgetting behavior.
Multimodal graph learning framework addressing topology quality issues in multimodal-attributed graphs through task-aware co-evolution.
DNNs for detecting smart contract vulnerabilities using contrastive learning and granular-ball training to handle noisy labeled datasets.
Survey of counterfactual explanation algorithms for time series classification, covering instance-based, pattern-driven, and gradient-based methods.
Distributed online algorithm for multi-agent submodular maximization under communication delays for information-gathering in dynamic environments.
RG-TTA: Meta-controller for test-time adaptation in streaming time series forecasting that modulates adaptation intensity based on regime similarity.
KVSculpt: KV cache compression for long-context LLM inference treating compression as knowledge distillation, orthogonal to quantization and low-rank methods.
Stability analysis of relative temporal-difference learning with linear function approximation, establishing convergence conditions when discount factor approaches one.
Primal-dual policy optimization algorithm for safe reinforcement learning in linear mixture CMDPs with adversarial rewards and unknown transitions.
Eigenvalue tail index of neural network weight matrices predicts test accuracy under label noise, achieving R^2=0.984 as diagnostic for data quality.
ATLAS-RTC: Runtime control system for LLM agents that enforces structured output via token-level monitoring, drift detection, and closed-loop interventions during decoding.
ITQ3_S: 3-bit weight quantization method for LLM inference using rotation-domain adaptive quantization to reduce precision loss from weight distribution outliers.
Physics-informed neural networks using transformer attention mechanisms for reconstructing continuous fields from sparse observations governed by PDEs.
Proteina-Complexa: fully atomistic protein binder generation method combining conditional generative modeling with structure-based optimization.
Optimization techniques for efficient inference in large vision-language models addressing computational bottlenecks from high-resolution visual tokens.
Distributed stochastic gradient descent with game-theoretic incentives to prevent gradient manipulation by strategic agents while ensuring convergence.