3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding
Method to reduce hallucinations in 3D embodied AI agents using visual contrastive decoding on multimodal LLMs.
Method to reduce hallucinations in 3D embodied AI agents using visual contrastive decoding on multimodal LLMs.
Differentiable probabilistic programming for gamma-ray astrophysical analysis using GPU acceleration and vectorization.
Multimodal inference task with text, audio, video for producing calibrated probability estimates of hypotheses with fine-grained uncertainty.
LLM application translating network quality metrics to user experience quality using large language models for multimedia systems.
Deep learning framework for tracking and reconstructing ligament lineage during liquid sheet breakup using multi-object tracking.
Hybrid behavioral analysis framework combining static and dynamic analysis for early-stage ransomware detection before file encryption.
Active learning strategy for predicting detonation performance of energetic materials using limited experimental data.
Probabilistic framework for inferring 3D cloud microphysical properties from 2D satellite observations for weather modeling.
Hardware-agnostic world models for quadrupedal robots using morphology conditioning to generalize across different robot embodiments.
Deep learning approach for detecting stepping-stone intrusions by correlating network flows at relay hosts with low false positive rates.
Statistical framework for designing large-scale factorial experiments with overlapping conditions on shared user populations.
Multi-view circuit graph benchmark suite standardizing representations for GNN-based physical design tasks from RTL to GDSII.
Scene graph benchmark for content moderation with spatial grounding and interpretability for detecting sensitive behavior in images.
Transformer architecture with uncertainty quantification for medical image classification, addressing overconfident predictions in clinical settings.
RL framework for improving LLM reasoning by optimizing for logical consistency and structural integrity of reasoning processes, not just final answers.
Proposes utility-centric approach to information retrieval for RAG systems, optimizing retrieved documents for task completion rather than topical relevance.
Novel algorithm for learning directed acyclic graphs from observational data with positive-valued variables using moment-ratio scoring.
Supervised adaptation of vision-language models outperforms prompting for cloud segmentation in remote sensing under domain shift.
Projected functional gradient descent algorithm for online quantile regression in nonparametric additive models.
ASTRA: Adaptive semantic tree reasoning architecture for table question answering using LLMs with improved serialization and schema flexibility.
Hypergraph neural networks applied to enumerate Minimal Unsatisfiable Subsets in constraint satisfaction problems more efficiently.
Regime-conditional retrieval approach with transferable router for two-hop QA using surface-text predicates for routing decisions.
ImageProtector method prevents multi-modal LLMs from analyzing images via visual prompt injection for privacy protection.
Multi-agent mixture of experts with plasticity enhancement for UAV communication networks under non-stationary conditions using deep RL.
Proposes Advantage-Guided Diffusion for model-based RL using diffusion world models to reduce compounding errors in trajectory generation.
Continual visual place recognition system for aerial autonomy addressing catastrophic forgetting using geometric memory management in dynamic environments.
NyayaMind framework for transparent legal judgment prediction in Indian courts using structured reasoning aligned with legal methodology.
CLIP-Inspector framework for detecting backdoor attacks in prompt-tuned CLIP models via out-of-distribution trigger inversion.
Dynamic Assembly Forest model detecting diffusion-generated images using ensemble methods, alternative to deep neural network approaches.
FIRE-CIR framework for composed image retrieval using vision-language models with fine-grained reasoning about what to preserve and modify.
MATCHA: DNN deployment framework generating concurrent schedules for heterogeneous multi-accelerator edge SoCs using constraint programming optimization.
Theoretical framework for identifying causal effects using single proxy variables of unobserved confounders under completeness assumptions.
Energy-Shifting deep learning framework for accelerating Monte Carlo dose calculation in radiotherapy by synthesizing distributions from monoenergetic inputs.
MixFlow method improving diffusion models by using mixed source distributions instead of standard Gaussian to reduce generative path curvature.
Symbolic-Neural Consistency Audit (SNCA) framework that extracts LLM self-stated safety policies via prompts and verifies model adherence to them.
YOLOv8-based facade parsing system augmented with alignment loss to enforce structural coherence in architectural element detection.
Riemannian gradient descent approach for optimizing low-rank functional tensor networks on arbitrary loss functions beyond least-squares regression.
Online intention prediction framework for autonomous systems using inverse reinforcement learning with time-varying objectives and unknown parameters.
Iterative Identification Closure framework for determining causal identifiability in linear structural equation models with latent confounders.
Fragment-based graph neural network integrated with many-body expansion theory for predicting potential energy surfaces in chemical systems.
CrossAbSense framework using protein language model encoders and attention decoders to predict antibody properties for therapeutic design validation.
Hybrid quantum-classical physics-informed neural networks for hydrological modeling with uncertainty quantification using variational quantum circuits.
Theoretical analysis of loss landscape in two-layer ReLU neural networks, characterizing local minima and their connection to stochastic gradient descent dynamics.
Learning-to-Defer framework that routes inputs to experts while selecting additional information (retrieved documents, tool outputs) to provide each expert, extending traditional routing systems.
Large-scale synthetic dataset with 2M videos covering physical phenomena for training physics-aware AI systems.
Systematic comparison of LLM task adaptation strategies including instruction revision, prompt optimization, and retrieval methods.
Video diffusion model learning joint distribution of video frames and camera trajectories for novel view synthesis.
Neural network architecture for haptic signal prediction in tactile internet using mode decomposition.
Open-source dataset and code for classifying human activity from accelerometer sensor data.
Model poisoning attack on federated learning without client collusion using independent adversarial updates.