PhotoAgent: A Robotic Photographer with Spatial and Aesthetic Understanding
Embodied AI agent integrating multimodal LLMs with chain-of-thought reasoning for robotic photography tasks.
Embodied AI agent integrating multimodal LLMs with chain-of-thought reasoning for robotic photography tasks.
PinPoint method for identifying instruction-relevant image regions in VLMs to reduce computational overhead.
Study evaluating whether frontier LLMs genuinely use reasoning steps or generate decorative narratives post-hoc.
End-to-end table recognition method with detail-aware learning and cell-level visual alignment.
Uncertainty-integrated neural network for unsupervised anomaly detection in industrial and medical imaging.
DETR-based object detection framework for miniature drone detection in complex environments.
Security analysis framework for LLM agent deployments covering model, tool code, credentials, and MCP configurations.
GNN method addressing over-smoothing in rumor detection on social media propagation trees using transformer architecture.
Dual-View Pheromone Pathway Network (DPPN) architecture for persistent structural memory in neural networks. Identifies coordinate system requirements.
Agent-Sentry: Security system for bounding LLM agents via execution provenance tracking. Addresses safety and security concerns in agentic systems.
Empirical study of sim-to-real transfer for robotic dexterous manipulation using vision-language-action models. Addresses synthetic-to-real gap.
Shows that confidence calibration fails when annotator disagreement exists. Proposes calibration against annotator distribution rather than majority labels.
Off-policy evaluation framework for optimizing survival outcomes with right-censored data. Applied to healthcare and retention decisions.
ForestPrune: Training-free token compression for video MLLMs using spatial-temporal modeling. Achieves high-ratio compression for video processing.
Discussion of EU AI Act implementation and proposed European AI Agency for regulatory oversight and governance. Policy focused.
EVA: Reinforcement learning method for video understanding agents using multimodal LLMs. Adaptive frame sampling and reasoning without manual workflows.
Method for LLMs to return set-valued predictions with coverage guarantees instead of single outputs. Improves answer discovery through repeated sampling.
Dataset of 9,224 Dari-language YouTube videos labeled for misinformation detection and harm levels. Addresses gap in non-English misinformation research.
Graph foundation models tested for zero-shot generalization across different GNN architectures and scales.
Visual backdoor attacks exploit mobile GUI agents via notification-based remote action execution.
Tabular data generation via probabilistic circuits questioned; current benchmarks overstated progress.
Concept-based explainability framework for flood/wildfire detection models in disaster management.
GLA-CLIP enables training-free open-vocabulary semantic segmentation with global-local window alignment.
Kolmogorov-Arnold networks improve YOLOv10 interpretability for object detection in degraded conditions.
Generative AI synthesizes full-range lung CT scans to address medical imaging data scarcity.
Climate foundation models tested for robustness under no-analog distribution shifts in future climate states.
RAG fine-tuning evaluation for EDA long-form generation with novel human evaluation metric TriFEX.
MSR-HuBERT self-supervised pre-training handles multiple audio sampling rates with adaptive downsampling.
DBAutoDoc automates database schema documentation combining statistical analysis with iterative LLM refinement.
Systematic literature review of ML models for early detection of burnout in software engineers.
AuthorMix uses modular layer-wise adapters for lightweight, flexible authorship style transfer with meaning preservation.
LLMs detect microservice architecture patterns across multiple programming languages outperforming single-language tools.
Explainable AI analysis reveals AI-generated text detectors exploit dataset artifacts rather than genuine detection signals.
Activation watermarking technique detects adaptive adversarial attacks against LLMs attempting to evade safety monitoring.
Semantic ID tokens enable LLM-based generative recommendation systems with efficient decoding over large item corpora.
Implicit reward modeling from human feedback like clicks for cost-effective LLM alignment via RLHF.
Foundational ML theory for learning under regime variation with evolving learner state and evaluation conditions.
Investigation of neural ODEs and SDEs for model-based reinforcement learning, showing neural SDEs better capture stochasticity in environment dynamics.
WeCAN: reinforcement learning framework for heterogeneous DAG scheduling addressing task compatibility, resource constraints, and rapid schedule generation.
AI lifecycle management for split RAN intelligent controller orchestration across non-terrestrial networks, comparing ground-centric and distributed deployment scenarios.
SafeSeek framework for universal attribution of safety circuits in LLMs using mechanistic interpretability to understand alignment, jailbreak, and backdoor behaviors.
Query-efficient jailbreak fuzzing method for LLMs that identifies token importance during prompt mutation to reduce redundant searching under query constraints.
Multimodal framework for human-multi-agent interaction integrating perception, embodied expression, and coordinated decision-making in shared physical spaces.
Analysis of LLM-based social network where autonomous AI agents interact through natural language, studying collective dynamics and emergent network fragility.
Comparative study of seven machine learning models for hourly weather forecasting in complex topography, including XGBoost, LSTM, and CNN-LSTM variants.
Agentic AI platform for portfolio investment screening using LLM agents for fundamental analysis and sentiment analysis with deliberation mechanism for buy/sell signals.
Curriculum learning framework for automated radiology report generation from 3D CT volumes using Llama 3.2, addressing sequence length and class imbalance challenges.
Philosophical analysis of normative implications when AI companions are updated, examining provider control and relationship structure.
Analyzes user perception of Android's Earthquake Alert system using LLMs on social media data from 2025 Türkiye earthquake.
Proposes contrastive metric learning for point-cloud segmentation in detector systems using density-based clustering in learned metric space.