360{\deg} Image Perception with MLLMs: A Comprehensive Benchmark and a Training-Free Method
Benchmark and method for evaluating 360° image perception in multimodal LLMs, addressing geometric distortion and spatial reasoning challenges.
Benchmark and method for evaluating 360° image perception in multimodal LLMs, addressing geometric distortion and spatial reasoning challenges.
Domain adaptation method for sample-efficient transfer of drug-response prediction models from cell lines to patient tumors.
Domain adversarial training approach for robust AI-generated audio quality assessment without spurious correlations.
Scoping review of AI-driven digital mental health interventions including GenAI and HCAI across screening, support, and monitoring.
CoMAI multi-agent framework with task decomposition for robust and fair interview evaluation using coordinated LLM agents.
Technical review and taxonomy of 13 generative systems for quantum circuit and quantum code generation including agentic approaches.
State space model framework for light field super-resolution using multiple LF representations.
Visual prompt discovery method to diagnose and mitigate LVLM perception failures through semantic exploration.
Vision-language process reward models with explicit visual premise verification for reliable step scoring in reasoning.
Mixture of Experts framework for robust 3D object detection in autonomous driving under adverse weather conditions.
Genetic programming with surrogate models for dynamic multi-mode project scheduling with simulation-based optimization.
VisBrowse-Bench benchmark for evaluating visual-native search in multimodal browsing agents using MLLMs.
End-to-end framework using Speech LLMs for spoken question answering with attention-guided evidence grounding.
Human-centered architecture for integrating LLM-based cognitive assistants into manufacturing quality management systems.
Interpretable ML framework for predicting non-small cell lung cancer drug response using patient genetic data.
Security research on sentiment steering attacks targeting RAG-enabled large language models and LLM robustness.
Machine learning pipelines for radio astronomy data processing with explainability focus on automating configuration.
YOLO-based deep learning for automated wasp identification with explainable AI integration for taxonomic classification.
Plaza6G platform for experimental trials in 5G/6G networks with AI-assisted orchestration of cloud and wireless resources.
Remote sensing monocular depth estimation using Vision Transformers and diffusion models for real-time processing.
DynamicGate MLP conditional computation framework using learned structural dropout and input-dependent gating for efficiency.
FederatedFactory zero-dependency framework for federated learning in non-IID scenarios using generative one-shot learning.
Age prediction models analyzed for out-of-distribution generalization, bias mitigation, and interpretability with causal implications.
Physics-guided diffusion framework for full-waveform inversion combining score-based generative models with wave-equation simulations.
Fanar 2.0 Arabic generative AI platform built on 256 H100 GPUs at QCRI with sovereign infrastructure and data pipelines.
Study identifying flaws in LLM benchmarks for Icelandic, highlighting issues with synthetic and machine-translated evaluation data.
PlotTwist creative plot generation framework using small language models with specialized training for narrative coherence.
Method for adding persistent memory to frozen encoder-decoder LLMs via trainable adapters in continuous latent space.
IndexRAG approach for multi-hop question answering that performs cross-document reasoning at indexing time using bridge entities.
SF-Mamba state space model for vision addressing non-causal patch interactions with improved computational efficiency.
SlideFormer system for fine-tuning large language models on single GPU via asynchronous engine and sliding window approach.
LenghuSky-8 eight-year all-sky cloud dataset with star-aware masks for astronomical observatories and nowcasting.
EngGPT2-16B open Italian LLM trained on 2.5T tokens, efficient inference with performance comparable to larger models.
CD-FKD cross-domain feature knowledge distillation for single-domain generalization in object detection.
Multi-agent reinforcement learning approach for managing delayed channel state information in multi-satellite communication systems.
DST-Net dual-stream transformer for low-light image enhancement using spatial convolution and feature guidance.
Unlearning method for one-step generative models using unbalanced optimal transport for safer image generation.
LenghuSky-8 millisecond-resolution network dataset for time series foundation models with high-frequency data.
FEAT foundation model with linear complexity for structured data in healthcare, finance, and e-commerce with improved scalability.
DanceHA multi-agent framework for document-level aspect-based sentiment analysis, extracting ACOSI tuples from documents.
CompDiff uses hierarchical compositional diffusion to generate fair medical images across demographic groups and intersections.
EmoLLM framework integrates appraisal-grounded cognitive-emotional reasoning into LLMs for contextually appropriate responses.
Deep learning inverse design for Doherty power amplifiers using CNN surrogate models and genetic algorithms.
Analysis of human-LLM chat logs characterizing delusional spirals and negative psychological effects from extended chatbot interactions.
Manifold-Matching Autoencoders regularize autoencoders by aligning pairwise distances between latent and input spaces.
Research on classifying malicious AI agent skills using repository context to improve detection in skill marketplaces.
REFORGE reveals vulnerabilities in image generation model unlearning through multi-modal adversarial attacks in black-box settings.
BATQuant proposes outlier-resilient MXFP4 quantization via learnable block-wise optimization for deploying MLLMs and LLMs on accelerators.
Frequency-spatial fusion framework for cattle mounting pose estimation in cluttered, occluded environments.
Data-driven perimeter control for urban traffic congestion using machine learning instead of explicit modeling.