Automated Conjecture Resolution with Formal Verification
Automated framework for research-level mathematical problem solving combining LLMs with formal verification to reliably resolve conjectures and verify proofs.
Automated framework for research-level mathematical problem solving combining LLMs with formal verification to reliably resolve conjectures and verify proofs.
Representational collapse in multi-agent LLM committees: measurement of similarity showing agents produce redundant rationales despite different role prompts, with diversity-aware consensus.
InCaRPose: Transformer-based model for relative camera pose estimation in automotive in-cabin monitoring with distorted imaging environments.
k-Maximum Inner Product Attention for efficient graph transformers, reducing quadratic complexity while maintaining expressiveness for large-scale graphs.
Analysis of analogical reasoning in LLMs comparing probed representations with prompted performance, revealing limitations in latent abstraction and generalization.
Field experiment on LLM agent providing iterative personalized behavioral nudges for electricity and hot-water conservation across intervention rounds.
Regime-calibrated demand priors for ride-hailing dispatch using historical segmentation and multi-metric similarity ensemble for fleet repositioning.
Lorentz-Invariant Auction mechanism for bandwidth allocation across heterogeneous-delay networks including LEO satellites and deep-space relays.
I-CALM: prompt-only intervention reducing LLM hallucinations by incentivizing confidence-aware abstention through reward scheme announcements and humility principles.
DC-Ada: reward-only decentralized adaptation for heterogeneous multi-robot teams, adapting frozen policies to mismatched sensor configurations.
Secure-by-design GenAI framework for cloud security and forensics using LLMs with defenses against prompt injection and forensic rigor requirements.
Spatio-temporal sparse autoencoders for interpretable video representation learning, using contrastive objectives and hierarchical grouping to preserve temporal coherence.
Multi-turn decision making framework for goal-oriented conversational systems balancing information acquisition and target commitment under user intent uncertainty.
AdaptFuse: training-free framework for LLMs to perform Bayesian belief updating across multi-turn interactions without fine-tuning on user data.
Regime-calibrated approach for ride-hailing demand prediction using historical trip segmentation and similarity ensemble matching across temporal patterns.
Low-bit mixed-precision attention kernel using MXFP format for efficient LLM inference, reducing memory bandwidth and computational costs of transformer attention mechanisms.
Symbolic-Vector Attention Fusion (SVAF): mechanism for multi-agent communication enabling agents to evaluate which signal dimensions to use in collective intelligence systems.
VLA-Forget: unlearning framework for vision-language-action embodied models in robotic manipulation, removing unsafe behaviors while preserving perception and language grounding.
TraceGuard: structured multi-dimensional monitoring protocol for detecting attacks on untrusted AI agents, addressing collusion risks through five-dimensional evaluation of agent reasoning and actions.
Gram-anchored prompt learning method for Vision-Language Models using second-order statistics for parameter-efficient adaptation.
Analysis of noisy label robustness in Reinforcement Learning with Verifiable Rewards for training LLM reasoning models.
Causality laundering: security vulnerability in tool-calling LLM agents where adversaries exfiltrate information through denial-feedback patterns.
CoopGuard: stateful cooperative multi-agent defense framework protecting LLMs against evolving adversarial attacks across multi-round interactions.
First comparative analysis of emotion vector extraction methods across 9 small language models using multiple architectural families.
BAAI Cardiac Agent: multimodal AI agent for automated cardiovascular disease diagnosis from cardiac MRI with specialized expert models.
Real-time traffic monitoring system using YOLOv11 object detection with multi-object tracking in PyTorch/OpenCV.
Theoretical analysis of parent selection mechanisms in genetic algorithms and evolutionary computation optimization.
Fine-tuning language models to enhance embeddings for cognitive modeling in online education systems.
Multi-stage LLM-assisted workflow for generating quantum many-body algorithms using LaTeX intermediate specifications.
Research on generalization guarantees for stochastic bilevel optimization in machine learning, hyperparameter optimization, and meta-learning.
Analysis of carbon footprint from GenAI tool usage and conference activities in software architecture research.
Container-based testbed for reproducible cybersecurity experimentation and network traffic generation.
Study on training robust vision features for CT imaging to enable transfer learning for clinical diagnostic tasks.
Research on dexterous robotic grasping using reinforcement learning with sparse guidance for multi-finger manipulation control.
Method for scalable LLM personalization using portfolio selection across heterogeneous user preferences, maintaining single shared model instead of per-user instances.
Test-time adaptation approach for cross-region generalization in land surface temperature prediction, addressing domain shifts in remote sensing applications.
Method for incomplete multi-view multi-label classification using shared codebook and fused-teacher self-distillation under dual-missing conditions.
GENFIG1 benchmark evaluating vision-language models on generating Figure 1 visual summaries of scholarly research, assessing conceptual richness in scientific communication.
GraphicDesignBench: first comprehensive benchmark for evaluating AI models on professional graphic design tasks including layout, typography, and communicative intent.
Multi-objective automated discovery framework for microscopy and characterization workflows, addressing premature convergence through exploration coordination across structural and spectral spaces.
Analysis of learning complexity in evolutionary robotics versus robot learning, examining optimization time scales and what is being optimized in robotic systems.
ClawArena benchmark evaluating AI agents' ability to maintain correct beliefs in evolving information environments with contradictory sources and changing evidence.
Postcolonial analysis of structural bias toward American English in foundation models, examining geopolitical data curation and linguistic standardization in LLM development.
LOCARD: agentic framework modeling blockchain forensics as sequential decision-making, enabling dynamic iterative investigations instead of static inference pipelines.
Formal framework using Temporal Behavior Trees to repair suboptimal trajectories from imperfect demonstrations before downstream imitation and reinforcement learning.
Framework and benchmark for converting web elements into autonomous agents as foundational primitives for the Agentic Web, enabling automated agent generation from digital assets.
Dual-path teacher-student framework for learning aligned multimodal embeddings from weakly paired audio-visual corpora using hierarchical semantic consistency.
Analysis of Mixture-of-Experts token routing across training phases using congestion game modeling, tracking three-phase trajectory in OLMoE and OpenMoE models.
Systematic audit of probability calibration in multimodal deep learning models combining histopathology images and genomic data for cancer survival prediction.
Federated reinforcement learning from human feedback method for aligning LLMs with diverse human preferences while preserving privacy and achieving fair reward aggregation.