Surveys natural language interfaces to spatial and temporal databases, covering methods and taxonomy for NLIDBs with geospatial data.
Proposes energy-based modeling for discrete graph generation using transport-aligned sampling to improve efficiency and quality.
Extends Priority Inheritance with Backtracking (PIBT) algorithm for multi-agent path finding with multiple dependencies in congested environments.
Proposes SortedRL, a length-aware scheduling method to accelerate RL training for LLMs, reducing rollout bottleneck in long chain-of-thought generation.
Examines how humans attribute errors in multi-agent AI systems under delayed feedback, revealing biases in decision-making across sequential steps.
Studies practical adversarial attack feasibility against ML-based IoT intrusion detection systems, addressing implementation constraints.
Evaluates whether LLM-generated tests reflect genuine program understanding or superficial pattern reproduction, examining behavior under software evolution.
Proposes 3DCity-LLM, a multimodal LLM framework for 3D city-scale perception using coarse-to-fine feature encoding across object, relational, and global contexts.
Introduces benchmark dataset and evaluation framework for code review agents, addressing code quality assurance as AI-generated code scales.
Improves diffusion-based image inpainting through one-step inversion to reduce artifacts and sampling steps.
Proposes VTAM, extending video-action models for embodied AI with tactile sensing for contact-rich physical interactions beyond vision-only approaches.
ReqFusion integrates multiple LLM providers (GPT, Claude, Groq) to automate software requirements extraction, classification, and analysis.
Shows LLMs produce unstable outputs on gender inference tasks under minimal context variations, revealing dependence on cultural stereotypes in training data.
Proposes VISOR, a method to reduce inference costs in large vision-language models through dynamic, sparse vision-language interactions without information bottlenecks.
Evaluates Vision Language Models' ability to perform pre-diagnostic sanity checks in medical imaging, identifying gaps between fluent text generation and safe visual understanding.
RealCQA-V2 benchmark for evaluating multimodal reasoning on scientific chart understanding with visual entailment verification.
BSDS system architecture integrating AI agents with data platforms for business-semantic-centric decision-making and workflows.
TRACE framework for self-evolving agent benchmarks that dynamically increase difficulty using test-time exploration and validation.
BIRD-INTERACT benchmark evaluating LLMs on multi-turn text-to-SQL tasks with dynamic interactions and error handling.
BuilderBench benchmark for evaluating AI agents' ability to learn through exploration and interaction beyond training data patterns.
ML system for detecting methane emissions from satellite spectroscopy data, addressing false detections in environmental monitoring.
Hybrid Stackelberg game and diffusion-based auction mechanism for task offloading among collaborative AI agents in Internet of Agents.
Domain-specific risk taxonomy and evaluation framework for LLM-based driving assistants addressing safety-critical scenarios.
Entropy-based analysis shows reducing entropy improves tool-use behavior in LLM agents, reducing excessive tool calls and latency.
Classical Chinese jailbreak prompts bypass LLM safety constraints more effectively than English due to obscurity and conciseness.
Evidence-grounded diagnostic reasoning agent using vision-language models for chest X-ray interpretation.
LLM-enabled agentic workflow automating coverage analysis and gap identification for IC formal verification.
Extends RAG paradigm to time-series foundation models for predictive maintenance with covariate dynamics.
Adopts goal recognition heuristics for classical planning problems to improve plan search prioritization.
Multi-agent routing system aware of cascading failures in tree versus cyclic graph topologies with geometry-switching.
Knowledge graph-driven multi-agent LLM framework for semantic geospatial data discovery with improved retrieval.
Cerebra: multi-agent AI system with specialized agents for EHR, clinical notes, and multimodal data in dementia assessment.
Evaluates ChatGPT (GPT-3.5/4) effectiveness on extracting research challenges from HCI literature at scale using two-step approach.
Method for reliable out-of-distribution virtual screening in drug discovery using extrapolatory pseudo-label matching.
Convergence analysis of linear temporal difference learning with arbitrary features removing linear independence assumption.
HFLDD: hybrid federated learning framework using dataset distillation to handle non-IID data distribution skew.
LOGSAFE: logic-guided defense mechanism for federated learning in time-series cyber-physical systems against poisoning attacks.
Attention calibration method to reduce object hallucinations in vision-language models through vision token reordering.
BalanceKV: streaming algorithm using discrepancy theory to approximate attention for efficient long-context LLM token generation.
Training-free graph filtering approach for multimodal recommendation systems without neural network overhead.
Studies use of deliberation-enhancing chatbot to help groups detect deepfake text through human-AI collaboration.
Expectation Reflection introduces multiplicative learning paradigm using observation-prediction ratios instead of additive gradient updates.
Agentic system autonomously generates, evaluates, and refines quantum feature maps for quantum machine learning using LLMs.
Information-theoretic framework to characterize and quantify information leakage in concept-based models for interpretability.
Meta-optimization framework for LLMs to generate generalizable heuristics for combinatorial optimization without manually predefined evolutionary operators.
PRISM video dataset condensation method handling spatial appearance and temporal dynamics interdependence for sparse motion.
Addresses contextual contradiction in text-to-image diffusion models where concept combinations contradict learned priors.
CyberGym large-scale benchmark with 1,507 real-world vulnerabilities for evaluating AI agents' dynamic cybersecurity capabilities.
Learns minimum action distance metric from state trajectories alone to capture environment structure for MDPs without rewards or action labels.
UniCA unifies covariate adaptation for time series foundation models to handle diverse heterogeneous covariates including categorical and multimodal data.