Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures
Causal evaluation protocol measuring whether intermediate structures (rubrics, checklists) causally determine LLM outputs or merely accompany them.
Causal evaluation protocol measuring whether intermediate structures (rubrics, checklists) causally determine LLM outputs or merely accompany them.
Multimodal LLM (ExpressMind) for expressway operation, applying cognitive intelligence to transportation systems beyond rule-based approaches.
Investigates customization approaches for smaller open-source LLMs to improve domain-specific code generation without relying on large proprietary models.
Proposes guardrails for LLM-enabled robots allocating scarce assistance across multiple users with conflicting values and unpredictable LLM behavior.
BenchPreS evaluates whether memory-based LLM personalization appropriately suppresses user preferences in context-sensitive communication settings.
V-DyKnow benchmark evaluates how vision-language models handle time-sensitive knowledge that becomes outdated after training.
Framework for runtime governance of LLM-based AI agents, balancing task completion with legal and reputational costs through execution-path monitoring.
Analyzes AI reasoning about geopolitical conflicts using temporally grounded case study of 2026 Middle East conflict after model training cutoffs.
Integrates constraint propagation into dynamic programming to bridge gap between state-based and constraint-based paradigms for combinatorial problems.
Pipeline for developing norm-compliant reinforcement learning agents inspired by Pinocchio story, addressing safe AI integration into society.
Fine-tuning LLMs on journal publication decisions to enable models to assess scientific merit and predict promising research directions.
Mobile app teaching digital literacy and prebunking misinformation tactics through interactive challenges in nine languages.
Code LLM series (7B-40B) using code-flow multi-stage training paradigm to capture dynamic software logic evolution.
Investigation of how user personalization and mental health disclosure affect harmful behavior in tool-using LLM agents.
Benchmark for evaluating continual learning in biomedical NLP across task-diverse datasets with robustness and efficiency metrics.
Study of reproducibility in AI coding agents, showing agent-to-agent variation produces nonstandard errors in empirical results.
Two-stage RL framework training multimodal agents for anticipatory reasoning and long-term planning in multi-step tasks.
Pipeline integrating forecasting models and ML regressors with inventory optimization, evaluated on M5 Walmart dataset.
Evaluation of conformal factuality as reliability guarantee for RAG-based LLMs with novel metrics and robustness analysis.
Large-scale multimodal surgical dataset and foundation models for cross-procedure generalization in surgical AI tasks.
Study of cultural bias in LLMs and prompt-based methods to improve cultural alignment for policy and decision-making tasks.
RL environment where LLM agents learn to generate professional presentations through research, planning, and tool use with multi-component reward system.
Method for training LLM agents to leverage rich environment feedback through reflective experience and post-training, improving long-horizon planning.
Benchmark evaluating audio-visual social interactivity capabilities of omni-modal LLMs in dynamic dialogue settings.
RL framework using Soft Actor-Critic to learn adaptive ray sampling policies for efficient neural radiance field rendering.
Multimodal AI search framework combining vector search, hybrid retrieval, and reasoning for pharmaceutical data across text, images, audio, and video.
Evaluation of VLMs (GPT-4V, Gemini, Claude, Llava) for navigation assistance tasks for people with vision impairments.
Framework extending RLHF using multi-dimensional rubric-based rewards instead of scalar signals for RL training.
Inference-time governance approach for LLMs using adaptive prompt routing to enable social alignment without retraining.
Federated learning framework integrating knowledge graphs and temporal transformers for early sepsis prediction in multi-center ICUs.
Study on recursive language models with self-reflective program search for long-context handling, addressing information extraction challenges.
Analysis of Gini Index role in prompt-based classification for detecting and optimizing class accuracy disparities in long-tailed datasets.
Defense mechanism against steganographic collusion in multi-agent reinforcement learning using dynamic representational circuit breaking.
Model rectification framework using attribution-guided rank-one editing to fix unreliable neural network behaviors on corrupted samples.
Open-source pipeline extending single-agent AI orthodontic treatment planning to dual-agent framework with improved tooth segmentation and landmarks.
Application of quantum amplitude estimation to catastrophe insurance tail-risk pricing with convergence analysis and NISQ noise effects.
AI agent system for hardware design reviews using LLMs to verify semantic correctness of component connections against datasheets.
Framework for LLM application release management using automated self-testing with evidence-based quality gates across five dimensions.
Analysis of transformer training dynamics using Spectral Edge Dynamics to measure coherent optimization directions versus stochastic noise.
Context-aware safety framework for personalized text-to-image models that prevents misuse without broad concept erasure.
Analysis of multi-turn safety failures in LLMs through state-space perspective, showing structured contextual evolution enables jailbreaks.
Token compression method for omnimodal LLMs using dynamic audio-driven semantic chunking to reduce inference costs for audio-visual processing.
Domain adaptation approach for remaining useful life prediction using evidential learning under incomplete degradation trajectories.
Study on engineering challenges in LLM-based multi-agent systems, addressing context pressure, coordination errors, and system drift at scale.
Defense framework against backdoor attacks in LLMs using trigger generation and inversion to locate and mitigate malicious triggers.
Study on over-smoothing in hypergraph neural networks using Ricci flow theory to improve message passing and layer depth handling.
Research on using inference time as a proxy to estimate LLM energy consumption, addressing opacity in API-based model access and environmental impact.
SEMAG: self-evolutionary multi-agent code generation framework that decomposes programming tasks into planning, coding, debugging stages with adaptive workflow selection.
Uncertainty-guided multi-expert framework for imbalanced sequence learning addressing poor expert specialization and prediction conflicts in long-tailed data.
Retrieval-augmented generation framework using GPT-4 to accelerate CO2 reduction catalyst discovery by exploring chemical spaces and interpreting results.