Measuring and curing reasoning rigidity: from decorative chain-of-thought to genuine faithfulness
Introduces Step-Level Reasoning Capacity metric and LC-CoSR training method to measure and reduce reasoning rigidity in chain-of-thought reasoning.
Introduces Step-Level Reasoning Capacity metric and LC-CoSR training method to measure and reduce reasoning rigidity in chain-of-thought reasoning.
Training-free spatial-temporal token compression method for video MLLMs achieving high-ratio visual token reduction via forest modeling.
Philosophical analysis of control dynamics and relational ethics in human-AI companion interactions examining provider authority.
Memory-sparse attention mechanism enabling LLMs to scale context to 100M tokens through efficient memory modeling instead of full attention.
Multimodal deception detection using schema-driven approach with audiovisual analysis across multicultural datasets for forensics applications.
LLM-enabled threat hunting framework for SOC analysts integrating Splunk with policy-guided decision making for APT detection.
Benchmark framework for evaluating multimodal LLMs as perceptual backbones for autonomous agents in 3D environments with decision-dense scenarios.
Reinforcement learning framework enabling multimodal LLMs to autonomously crop and focus on regions of interest for improved perception in complex visual scenes.
Coarse-to-fine reasoning framework using reinforcement learning for interpretable multimodal sentiment analysis with MLLMs and hint-guided training.
Bio-inspired self-evolving network architecture for autonomous agents using evolutionary approaches instead of static human-defined protocols.
Open source sound effect foundation model from Sony AI with audio encoder/decoder and text-to-audio capabilities.
Framework analyzing agent communication protocols for LLM systems across three layers: communication, syntactic, and semantic. Systematically organizes 18 representative protocols.
Evaluates whether LLMs can infer causal intervention effects from natural language descriptions using behavioral simulation on climate-psychology interventions.
Generative camera system using visual preference optimization for cinematic trajectory generation. Addresses framing and composition without director feedback loop.
LitPivot: tool supporting iterative research idea development through dynamic literature contextualization and AI-driven critique.
Method to identify valence-arousal emotional subspace in LLM representations for emotion steering and behavioral control.
Privacy-preserving system using LLMs to analyze student attention in classroom videos without storing identifiable footage.
Zero-shot quantization method using weight-space arithmetic to improve post-training quantization robustness across models.
Economic analysis of AI productivity gains showing sustained tool use erodes worker expertise over time.
Taxonomy of LLM-based coding agent architectures via source-code analysis, categorizing control loops, tool definitions, and context strategies.
Analysis of policy routing circuits in alignment-trained LLMs, localizing attention gates and amplifier heads controlling refusal behavior.
StableTTA: training-free test-time adaptation improving image classification accuracy via novel ensemble strategies.
EduIllustrate: benchmark evaluating LLMs on automated generation of diagram-rich educational content combining visuals and reasoning.
VideoStir: retrieval-augmented generation system for understanding long videos with multimodal LLMs using spatio-temporal structure.
Theoretical analysis proving limitations of continuous wrapper defenses against prompt injection attacks in LLMs.
MoBiE: binarization framework for efficient inference in Mixture-of-Experts LLMs via post-training quantization.
Analysis of emotional representation geometry in LLM latent spaces for transparency and safety.
Fine-grained benchmark evaluating multimodal LLMs on manufacturing scenarios.
Privacy-preserving synthetic data generation using LLMs with differential privacy mechanisms.
LLM-based tool for identifying HIV-related stigma in clinical narratives.
Data selection framework for multi-turn dialogue instruction tuning addressing dataset noise and inconsistency.
Genetic programming combined with latent space optimization for symbolic regression via neural encoders.
Attack exploiting denoising irreversibility in diffusion language models to bypass safety alignment.
Optimization for geometric foundation models in monocular SLAM via efficient keyframe selection.
Empirical study measuring bias amplification in multi-agent systems topologies and feedback loops.
Text-to-speech synchronization method for automated dubbing with phonetic alignment.
Interactive ASR system with semantic coherence evaluation and human-like correction mechanisms.
Physics-guided surrogate learning for turbulent flow control without heavy RL computational costs.
Framework for managing hierarchical instruction conflicts in multi-source LLM agent environments.
Theoretical connection between Transformers, diffusion maps, and magnetic Laplacians through Markov geometry.
Framework for evaluating fairness and equity across patient subgroups in brain tumor segmentation models.
Study of deliberative alignment for deeper safety in reasoning LLMs via attribution analysis.
Analysis of working memory limitations in LLMs and comparison with biological systems.
RWKV-based RL approach with explicit belief state representation for partial observability problems.
Computational model using Transformer self-prior to simulate mirror self-recognition behavior without external rewards.
Theoretical analysis comparing entropy regularization and covariance-based mechanisms for controlling policy collapse in RL-enhanced LLMs.
Framework for robust structured prediction using Tsallis reweighting and task-agnostic prompting with XML structure for group-robust fine-tuning.
Guide-Core Policies framework for black-box LLM agents where guide models generate structured strategies executed by core models reducing inference costs.
Review of explainable AI mechanisms for human activity recognition in healthcare, assistive living, and smart environments applications.
Unified visual encoding and decoding framework from neural activity modeling consistency between brain stimulus prediction and reconstruction.