HN tiredgirl4 4/3/2026

Cybernetic Entropy Control of LLMs

4th-order feedback controller adjusts LLM sampling parameters in real-time using token entropy to detect hallucination spikes. Improves MATH benchmark accuracy from 55% to 59.5% on Qwen 2B model.

Ax Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani 4/3/2026

Therefore I am. I Think

Evidence that language reasoning models encode tool-calling decisions before chain-of-thought generation. Analysis of model decision-making timing.

Ax Weyl Lu, Chenjie Hao, Yubei Chen 4/3/2026

Deep Networks Favor Simple Data

Analysis of OOD anomaly where deep networks assign higher density to simple out-of-distribution data than in-distribution test data.

Ax Zhengyang Tang, Ke Ji, Xidong Wang, Zihan Ye, Xinyuan Wang, Yiduo Guo, Ziniu Li, Chenxin Li, Jingyuan Hu, Shunian Chen, Tongxu Luo, Jiaxi Bi, Zeyu Qin, Shaobo Wang, Xin Lai, Pengyuan Lyu, Junyi Li, Can Xu, Chengquan Zhang, Han Hu, Ming Yan, Benyou Wang 4/3/2026

Do Phone-Use Agents Respect Your Privacy?

MyPhoneBench evaluation framework measuring privacy compliance in phone-use agents during mobile task completion.

Ax Marawan Gamal Abdel Hameed, Derek Tam, Pascal Jr Tikeng Notsawo, Colin Raffel, Guillaume Rabusseau 4/3/2026

Model Merging via Data-Free Covariance Estimation

Principled layer-wise optimization approach for model merging via data-free covariance estimation without task-specific training.