Ax Zhaoyang Zhang, Shuli Jiang, Yantao Shen, Yuting Zhang, Dhananjay Ram, Shuo Yang, Zhuowen Tu, Wei Xia, Stefano Soatto 3d ago

Reinforcement-aware Knowledge Distillation for LLM Reasoning

Reinforcement-aware knowledge distillation method for distilling RL-trained reasoning LLMs into smaller models while preserving chain-of-thought capability.

Ax Uzay Macar, Li Yang, Atticus Wang, Peter Wallich, Emmanuel Ameisen, Jack Lindsey 3d ago

Mechanisms of Introspective Awareness

Investigates mechanisms of introspective awareness in LLMs, where models detect injected steering vectors with minimal false positives.

Ax Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Yu-Lun Liu, Chih-Yu Wang, Jun-Cheng Chen 3d ago

Gen-n-Val: Agentic Image Data Generation and Validation

Agentic system for generating and validating synthetic image data to address data scarcity and label noise in vision tasks.

Ax Shahab Rahimirad, Guven Gergerli, Lucia Romero, Angela Qian, Matthew Lyle Olson, Simon Stepputtis, Joseph Campbell 3d ago

Bayesian Social Deduction with Graph-Informed Language Models

Evaluates LLM reasoning capabilities in social deduction game Avalon using Bayesian inference with graph-informed models.