Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition
Framework for improving VideoLLM understanding of camera motion through benchmarking, diagnosis, and explicit geometry injection.
Framework for improving VideoLLM understanding of camera motion through benchmarking, diagnosis, and explicit geometry injection.
Visual state representations for robotic agents using what-is-where composition for dynamic scene understanding.
FedPBS: federated learning algorithm for personalized training on non-IID data with improved robustness.
Sample-efficient hypergradient estimation for decentralized bi-level reinforcement learning in strategic decision-making.
Proxy models reduce cost and latency of AI queries in SQL databases by 100x through approximation techniques.
Domain-grounded tiered retrieval architecture to reduce LLM hallucinations through retrieval-based verification.
Evolutionarily Stable Stackelberg Equilibrium: game theory solution concept for asymmetric leader-follower games.
Ontology-Guided Diffusion for zero-shot sim2real transfer using neuro-symbolic approach to bridge simulation-reality gap.
Agent Control Protocol: formal specification for admission control governance of autonomous agents with cryptographic identity and policy compliance.
Multi-agent AI system with six specialized agents for automated NIST CSF-aligned cybersecurity risk assessments for small organizations.
Study showing finetuning bypasses LLM safety mechanisms and triggers verbatim recall of copyrighted training data.
Explainable DRL framework for autonomous APT defense using provenance-based graphs and stage-aware modeling.
LLM-based workflow system for multidisciplinary software development coordinating domain experts and developers in automotive.
PRISM photonic accelerator approach reducing KV cache memory bandwidth from O(n) to O(1) for long-context LLM inference.
mSFT algorithm for optimizing heterogeneous multi-task SFT data mixtures by dynamically adjusting compute per sub-dataset.
Weather prediction combining radar observations with foundation model priors for extended nowcasting horizons.
Sim-to-real transfer for humanoid robot control using state-dependent joint torque perturbations instead of domain randomization.
Inference-time scaling with lightweight latent verifiers instead of MLLMs to reduce computational cost in verification.
Method using causal interventions and Vision-Language Models to explain sparse autoencoder features in vision models.
Interpretable evaluation combining symbolic rules with mechanistic interpretability to detect memorization vs genuine generalization.
ITPO framework for optimizing multi-turn human-LLM interactions via RL despite sparse rewards and user stochasticity.
Theoretical analysis of upper entropy computation for credal sets and uncertainty quantification. Pure mathematics focus.
Training method combining synthetic QA and document generation to improve LLM knowledge beyond RAG performance ceiling.
Safe reinforcement learning framework inferring constraints from user preferences with minimal expert demonstrations.
RL agent optimizing operator kernels on Huawei Ascend NPUs. Addresses knowledge gap in alternative hardware ecosystem.
Causal signal reconstruction approach for converting sparse news sentiment into reliable time series for financial/tech analysis.
StateLinFormer model using linear attention for navigation agents with long-term memory. Addresses context window limitations in Transformers.
Research on curriculum learning with dual criteria for temporal data. Proposes improved difficulty-based training scheduling.
PoiCGAN: Poisoning attack method against federated learning systems using feature-label joint perturbation.
APreQEL: Adaptive mixed precision quantization technique for deploying large language models on edge devices with reduced memory and computational requirements.
Research on how LLMs form discrete decision boundaries within continuous semantic spaces through context-driven topological distortion of number representations.
Physics-informed neural networks using residual attention for steady-state electrothermal multiphysics simulation in energy systems.
MetaKube: LLM framework for Kubernetes failure diagnosis with Episodic Pattern Memory Network that learns from operational history to improve diagnostic accuracy over time.
Deep learning model for automated sleep disorder staging from EEG with analysis of generalization gaps in clinical populations.
Theoretical framework analyzing fundamental performance limits when deploying fixed LLMs as optimization modules in agentic systems.
Method for steering code LLMs via activation space manipulation to control programming language and library preferences at inference time.
VPBoost applies variable projection to gradient boosting for improved training of smooth parametric learners like neural networks.
Continuous-time diffusion model for generating synthetic electronic health records with mixed numerical and categorical features.
Framework for generating interpretable explanations of learned behaviors in RL agents with formal behavior definition.
Curriculum learning approach for contextual RL using closed-form updates for self-paced task sequencing.
Lightweight fairness method for LLM-based recommenders using kernelized projection and adapters without fine-tuning.
Domain adaptation framework for foundation models using probabilistic geometric alignment and Bayesian transport.
Mechanistic analysis of grokking phenomenon in ReLU MLPs on modular arithmetic revealing algorithmic structure.
Theoretical study showing diffusion models learn manifold geometry before memorization under manifold hypothesis.
Solution for training instability in physics-informed neural networks on epidemiological models by addressing gradient pathology.
Analysis of neural collapse phenomenon in regression models across multiple layers showing low-rank structure.
Theoretical analysis revealing convex equivalences in ReLU neural networks from sparse signal processing perspective.
Kolmogorov-Arnold networks combining neural learning with symbolic structure for interpretable scientific equation discovery.
Analysis of activation function curvature role in adversarial robustness using parameterized activation family.
Study of vision-language models' robustness to distribution shifts in visual deductive reasoning tasks.