MLLM-based Textual Explanations for Face Comparison
Analysis of multimodal LLM-generated natural language explanations for face verification on unconstrained face images.
Analysis of multimodal LLM-generated natural language explanations for face verification on unconstrained face images.
Omanic, a benchmark for step-wise evaluation of multi-hop reasoning in LLMs with step-level annotations for diagnosing failures.
Investigation of linguistically related language guidance for LLM translation in low-resource settings without large parallel data.
Study of emergent AI agent communities on platforms, analyzing 167k+ agents learning from each other without researcher intervention.
Kestrel, a training-free method for mitigating hallucinations in large vision-language models using grounding and self-refinement.
World action models for embodied control that eliminate test-time future imagination while maintaining action performance.
Resource-aware LLM-based agent reasoning for embodied robots using reinforcement learning to balance computation and action execution.
Computational cost analysis for matrix inversion updates in online outlier detection systems.
Federated learning models for predicting postoperative complications using multi-center healthcare data.
In-context learning improvement for vision-language models using retrieved counterfactuals for better visual reasoning.
SpecMoE mixture-of-experts foundation model for cross-species EEG decoding with spectral-temporal fusion.
Formal model for selecting statements that find common ground across diverse preferences using generative AI.
TurnWiseEval benchmark and analysis of multi-turn vs single-turn LLM capabilities with step-level evaluation.
3D vision-language model for unified dental diagnosis from intraoral scans leveraging native 3D geometry.
InCoder-32B, a 32B code foundation model optimized for industrial programming tasks with hardware semantics and resource constraints.
Visual representation alignment method for pixel-space diffusion models using co-denoising to improve semantic supervision.
Cross-embodiment dexterous grasping policy enabling zero-shot transfer across different robot hand morphologies without retraining.
Behavior tree planning for robot manipulation using context-aware grounding to automate controller design without extensive manual effort.
CPU-GPU architecture validation framework using ODIN-based simulation and emulation for chiplet-based system design.
Brain-computer interface decoding movement onset/offset for real-time control of rehabilitation exoskeletons using motor imagery.
Parallel Newton methods for sequential computation in dynamical systems, RNNs, and MCMC using GPU parallelization.
SOMA unified parametric body model bridging incompatibilities between SMPL, SMPL-X, and related human body representations.
SparkVSR interactive video super-resolution framework using sparse keyframe propagation for user-controlled artifact correction.
ManiTwin pipeline generates 100K simulation-ready 3D digital object twins from single images for robotic manipulation training.
3D scene reconstruction dataset and methods for decomposing messy kitchen scenes into individual objects with contact relationships.
Study examining reasoning mechanisms in diffusion-based video models, challenging chain-of-frames assumptions about how reasoning emerges.
Comprehensive survey of LLM reasoning covering inference scaling, learning to reason, and agentic systems as key advancement areas.
CHARM method calibrates reward models using Chatbot Arena scores to mitigate model preference bias and reward hacking in RLHF.
IMAIA interactive maps assistant enables natural language interaction with vector maps and satellite imagery with geospatial intelligence.
Survey of LLM applications in wireless communications covering adaptation, autonomy, and intelligent system design for complex communication networks.
Multi-agent pipeline for street design generation combining image generation and infrastructure design for urban planning visualization.
Hilbert system combines informal LLM reasoning with formal theorem proving in Lean 4 for verifiable mathematical proofs.
ReasoningBank memory framework enables LLM agents to learn from interaction history and distill generalizable reasoning strategies for continuous tasks.
Zephyrus agentic framework combines weather foundation models with LLM reasoning for interactive scientific workflows in meteorology.
Study showing AI agents fail under realistic user behavior variations; proposes high-fidelity human trait simulations for robust agent testing.
PREFINE framework enables personalized story generation using simulated user critics and rubric generation without explicit user feedback.
Multi-agent debate framework using small language models for cost-efficient LLM safety evaluation, with HAJailBench benchmark for jailbreak testing.
Alignment-Aware Quantization: PTQ method for efficient LLM deployment that preserves behavioral alignment and safety properties, not just minimizing reconstruction error.
SpatialBench: benchmark measuring multimodal LLM spatial cognition across hierarchical abilities for real-world physical environment interaction.
Analysis of multi-agent path finding algorithm design trade-offs under realistic robot execution constraints for warehouse and manufacturing applications.
Stepwise Think-Critique: unified framework integrating reasoning and verification in LLMs for robust, interpretable problem-solving with intertwined evaluation.
FusionRoute: token-level collaboration method enabling multiple specialized LLMs to work together, combining domain expertise efficiency with generalization.
VisTIRA: tool integration approach addressing modality gap where VLMs underperform on visual math problems compared to text-based versions.
LogicSkills: benchmark isolating three fundamental logical reasoning skills in LLMs: formal symbolization, countermodel construction, and logical inference.
Empirical study of latent chain-of-thought in LLMs using structural causal models to analyze intermediate computation steps beyond correlation-based probes.
Benchmark comparing zero-shot Time Series Foundation Models against classical methods for annual institutional demand forecasting under data sparsity.
MemPO: self-memory policy optimization approach enabling long-horizon agents to proactively manage memory content aligned with task objectives.
Model Medicine: framework for understanding, diagnosing, and treating AI model disorders using biological organism principles as analogy for model analysis.
UIS-Digger: LLM-based research agent system for unindexed information seeking, addressing blind spots where vital information isn't captured by search engines.
Evaluation of frontier AI models' autonomous cyber-attack capabilities on multi-step scenarios, tracking capability trends across 18 months of model releases.