Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm
Comparative study evaluating whether LLMs demonstrate Theory of Mind capabilities using psychological paradigms.
Comparative study evaluating whether LLMs demonstrate Theory of Mind capabilities using psychological paradigms.
TherapyGym evaluation framework for therapy chatbots measuring clinical fidelity and safety using psychotherapy rating scales.
Uncertainty-calibrated prompt optimization framework for LLM classification that measures model confidence to improve reliability.
LLM-based agent framework for automated extraction of structured political biography data from unstructured sources at scale.
DynaRAG framework extending RAG with dynamic API calls for time-sensitive queries; includes sufficiency classification and reranking.
Analysis of explainability in harmful content detection models, examining predictions on borderline and contextual cases.
MineDraft framework for batch parallel speculative decoding to accelerate LLM inference by parallelizing draft and verification stages.
Tool for collecting granular metadata about language model benchmarks to verify alignment with practitioner goals and test coverage.
Multi-task learning framework for personalized open-vocabulary keyword spotting with privacy and customization for voice assistants.
Keyword spotting framework integrating phoneme learning with personalized prosody modeling for speaker-specific voice recognition.
Study examining relationship between firms' AI technology innovation investments and consumer complaint patterns.
Adaptive Extended Kalman Filter using knowledge distillation for improved UWB/PDR indoor localization under NLOS conditions.
Method for increasing transformer modularity and interpretability through per-layer supervision to overcome distributed redundancy.
Quine runtime that implements LLM agents as native POSIX processes using OS-level isolation and scheduling instead of application-layer frameworks.
Method for distinguishing between system failures and domain shifts in industrial data streams using anomaly detection.
Study of poisoning attacks against RAG systems where adversaries corrupt retrieval corpora to manipulate LLM outputs; includes defenses.
Research on multi-agent LLM routing systems showing that quality-based delegation can fail when agents misreport performance; proposes delegation contracts to address this.
NANOZK: Zero-knowledge proof system enabling cryptographic verification that proprietary LLM API outputs actually used claimed models.
S3T-Former: Energy-efficient spike-driven state-space transformer for skeleton-based action recognition on resource-constrained edge devices.
MCP-38: Protocol-specific threat taxonomy with 38 threat categories for Model Context Protocol systems derived through systematic methodology.
Synthesizable RTL implementation of predictive coding networks enabling online, distributed hardware learning as alternative to backpropagation.
Lightweight LLM adaptation framework for technical service agents using latent logic augmentation and noise reduction techniques.
SLEA-RL: Step-level experience augmentation for multi-turn LLM agent training enabling dynamic retrieval and leveraging accumulated episode experiences.
Meta-BayFL: Probabilistic federated learning framework with Bayesian neural networks for heterogeneous data and model personalization.
Study uncovering latent phase structures and branching logic in deep RL locomotion policies for HalfCheetah control task interpretability.
Dynamic constraints framework for reinforcement learning fine-tuning that adapts constraints based on model capabilities to balance stability and optimization.
CytoSyn: Foundation diffusion model for computational histopathology enabling cell segmentation and tumor analysis from digitized slides.
Trace-based assurance framework for agentic AI orchestration with contracts, testing, and governance for LLM-coordinated multi-agent systems.
Training-only framework for few-shot CLIP adapters using heterogeneous image-patch-text graph supervision without inference cost overhead.
ARTEMIS: Neuro-symbolic framework combining neural operators and SDEs for interpretable, arbitrage-free quantitative finance models.
Discovery of bimodal drift rate structure in fast radio burst FRB 20240114A using unsupervised machine learning for astrophysics analysis.
Tula: Optimization framework for distributed large-batch training balancing communication overhead, computation cost, and generalization performance.
VC-Soup: Method for aligning LLMs with multiple conflicting human values using value-consistency guidance for trustworthy AI development.
Grace Cycle: LLM-augmented computational phenotyping framework for discovering clinical subtypes in Long COVID through iterative hypothesis generation and evidence extraction.
Conceptual framework proposing intellectual stewardship for how humans should adapt their roles in creative knowledge work alongside AI systems.
Insight-V++: Multi-agent visual reasoning framework for MLLMs enabling long-chain reasoning with high-quality training data and optimized pipelines.
Workshop report on advancing robotics and AI in healthcare, highlighting coordination needs between engineering and clinical priorities for safety and reliability.
User study demonstrating that extensive LLM use for writing assistance alters voice, tone, and meaning of human text with 70% increase in essay length.
Post-training framework adapting vision-language models for safety-critical autonomous driving event detection in dashcam footage through temporal alignment.
RAG-based system using LLMs for automated cybersecurity incident analysis through targeted log filtering across multiple data sources.
Gradient-informed temporal sampling strategy for training neural PDE surrogates, improving rollout accuracy beyond uniform and augmentation-based sampling.
MolRGen benchmark and training framework for evaluating reasoning-based LLMs on de novo molecular generation for drug discovery without ground-truth molecule pairs.
Interventional Boundary Discovery method using causal inference to identify controllable state dimensions in reinforcement learning with confounded distractors.
Sharpness-aware minimization technique in logit space addressing squeezing effect in Direct Preference Optimization for LLM alignment.
Low-rank convolution optimization for neural video compression (NeRV) reducing computational cost and memory for resource-constrained environments.
Analysis of LLM alignment through concept routing rather than detection, studying political censorship in Chinese language models across nine open-weight models.
Measurement study comparing computational costs of mobile robotic manipulation workloads across onboard, edge, and cloud GPU platforms using foundation models.
Sparse supervised learning framework for monocular 3D object tracking in videos, reducing annotation requirements for autonomous agent perception.
ChoiceEval framework for auditing brand and cultural preference biases in LLMs used as market intermediaries affecting consumer choices.
Neural graph representation method using reinforcement learning to solve approximate subgraph matching, an NP-hard problem in graph analysis.