Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine
Analysis showing chain-of-thought prompting underperforms direct answering in medical vision-language models due to perception bottlenecks in domain-specific tasks.
Analysis showing chain-of-thought prompting underperforms direct answering in medical vision-language models due to perception bottlenecks in domain-specific tasks.
Memory-efficient continual learning method using prototypical exemplar condensation to reduce storage requirements while maintaining performance.
Parallel framework combining imitation and reinforcement learning for autonomous driving, addressing limitations of sequential fine-tuning approaches.
Method to improve pretrained generative robot policies by replacing sampled noise with optimized constant noise vectors for downstream reward optimization.
Mid-training adaptation strategy for LLMs to improve automatic summarization of radiology reports, exploring domain-specific pre-training approaches.
RAM: motion capture system for 3D human pose reconstruction in unconstrained video with occlusion handling and temporal smoothing.
ChronoCon: contrastive learning approach for disease progression assessment from longitudinal medical imaging without explicit severity annotations.
CAIAMAR: multi-agent framework for context-aware image anonymization in street-level imagery using agentic reasoning.
Kill-chain canary methodology for tracking prompt injection attacks across multi-agent LLM systems with stage-level diagnostics.
System for making mathematical theorems interactive by grounding LLM-generated explanations in formal representations enabling execution and stepping.
Framework for eliciting and verbalizing LLM assumptions to explain and mitigate sycophantic behavior in model outputs.
Multi-stage LLM-assisted workflow for scientific algorithm development separating theory extraction, formal specification, and code generation.
Method for LLM personalization using a small portfolio of models capturing diverse user preferences without per-user models.
Distributional reinforcement learning approach for decision-making in healthcare, accounting for uncertainty across heterogeneous populations.
ALTO: system for adaptive LoRA hyperparameter tuning and orchestration across heterogeneous LLM fine-tuning workloads in multi-tenant environments.
DiffHDR: video diffusion model approach for converting low-dynamic-range videos to high-dynamic-range format.
WisdomInterrogatory (LuWen): open-source Chinese legal language model built on Baichuan foundation model for legal domain applications.
System for safe capability evolution in embodied agents with compatibility checking and runtime rollback mechanisms.
Training-free open-vocabulary semantic segmentation framework (OV-Stitcher) leveraging pretrained vision-language models without additional training.
HyperMem: hypergraph-based memory architecture for conversational agents enabling long-term context tracking and high-order associations.
Quantum-inspired ARIMA methodology combining quantum autocorrelation with variational quantum circuits for time series analysis.
Vision-language benchmark (CrashSight) for evaluating traffic crash scene understanding from infrastructure perspective.
Physics-aligned simulator (SIM1) for generating synthetic data in deformable object robotic manipulation tasks.
Framework combining LLMs with graph neural networks for text-attributed graph learning in low-resource settings using GNN feedback.
Bayesian optimization method (MG-TuRBO) for high-dimensional traffic simulation calibration, comparing genetic algorithms with Bayesian approaches.
QuanBench+ unified benchmark for LLM quantum code generation across Qiskit, PennyLane, Cirq with 42 aligned executable tasks.
Benchmark evaluating robustness of LLM reasoning with 14 perturbation techniques applied to mathematical reasoning tasks.
Silhouette loss function for learning discriminative representations with explicit geometric properties in embedding space.
Distillation framework compressing genomic foundation models for efficient mRNA representation learning.
Quantum-classical hybrid molecular generator using VAE and quantum computing for interpretable drug discovery.
DRTO combines token-level RLHF with distributional robustness to improve LLM resilience to input perturbations and formatting changes.
Automated label function generation for data annotation using LLMs with structured exploration-exploitation strategy.
Analysis of cross-modal alignment between vision and language encoders using functional map framework from computational geometry.
TinyML Z-score anomaly detection system running on resource-constrained microcontrollers using power side-channel data.
Dual-branch reconstruction method for multivariate time series anomaly detection using autoregressive flow-based density estimation.
CSAttention: sparse attention mechanism for accelerating LLM inference by reducing KV-cache bottlenecks through centroid-scoring without retraining.
Flow-matching generative model for CFD surrogate modeling on unstructured meshes as alternative to deep learning approaches.
Framework evaluating when LLMs should act versus escalate decisions using uncertainty estimation across five real-world domains.
Machine learning system predicting user engagement in digital mental health interventions using explainable ML methods.
AlphaLab autonomous research system using frontier LLMs as agents to automate full experimental cycles in optimization domains without human intervention.
Analysis of hallucination phase transitions in Whisper ASR models using spectral sensitivity theorem and eigenspectra analysis.
Multi-task learning for wireless interference detection and identification using adversarial training methods.
Federated learning approach using exemplar replay to reduce catastrophic forgetting in continual learning with dynamic heterogeneity.
StructRL: recovers dynamic programming structure from distributional RL learning dynamics. Bridges data-driven and structured approaches for stable learning.
Bayesian inference for spiking neural networks in speech processing. Explores weight uncertainty and loss landscape smoothing for temporal tasks.
Evidential Transformation Network: adapts pretrained models for post-hoc uncertainty estimation. Efficient alternative to ensembles/MC dropout for deployed models.
VOLTA: benchmark comparing uncertainty quantification methods for deep learning. Evaluates 10 UQ baselines across modalities and distribution shifts.
Game-theoretic analysis of creator incentives in multi-agent recommender systems. Cooperative game formulation for fair collaboration in bandit problems.
PRAGMA: foundation models for banking event sequences. Transformer-based architecture with self-supervised pretraining on financial transaction data.
Skip-Connected Policy Optimization (SKPO) for reinforcement learning with reasoning tasks. Improves upon GRPO by addressing high-variance advantage estimation.