Ax Julio Candanedo 9d ago

The Diffusion-Attention Connection

Theoretical connection between Transformers, diffusion maps, and magnetic Laplacians through Markov geometry.

Ax Hua-Dong Xiong (School of Psychological and Brain Sciences, Georgia Tech), Li Ji-An (Department of Psychology, New York University), Jiaqi Huang (Department of Cognitive Science, Indiana University Bloomington, Honda Research Institute), Robert C. Wilson (School of Psychological and Brain Sciences, Georgia Tech, Center of Excellence for Computational Cognition, Georgia Tech), Kwonjoon Lee (Honda Research Institute), Xue-Xin Wei (Departments of Neuroscience and Psychology, The University of Texas at Austin) 9d ago

Human-like Working Memory Interference in Large Language Models

Analysis of working memory limitations in LLMs and comparison with biological systems.

Ax Vijay Lingam, Aditya Golatkar, Anwesan Pal, Ben Vo, Narayanan Sadagopan, Alessandro Achille, Jun Huan, Anoop Deoras, Stefano Soatto 9d ago

ExecTune: Effective Steering of Black-Box LLMs with Guide Models

Guide-Core Policies framework for black-box LLM agents where guide models generate structured strategies executed by core models reducing inference costs.

Ax Smita Deb, Shirin Panahi, Mulugeta Haile, Ying-Cheng Lai 9d ago

Vestibular reservoir computing

Physical reservoir computing inspired by biological vestibular system addressing hardware complexity with designed uncoupled topology.

Ax Zunhai Su, Hengyuan Zhang, Wei Wu, Yifan Zhang, Yaxiu Liu, He Xiao, Qingyao Yang, Yuxuan Sun, Rui Yang, Chao Zhang, Keyu Fan, Weihao Ye, Jing Xiong, Hui Shen, Chaofan Tao, Taiqiang Wu, Zhongwei Wan, Yulei Qian, Yuchen Xie, Ngai Wong 9d ago

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Survey on attention sink phenomenon in transformers, covering utilization, interpretation, and mitigation strategies.