Ax Yogesh Agrawal (University of Central Florida), Aniruddha Dutta (University of Central Florida), Md Mahadi Hasan (University of Central Florida), Santu Karmaker (University of Central Florida), Aritra Dutta (University of Central Florida) 3/20/2026

FinTradeBench: A Financial Reasoning Benchmark for LLMs

FinTradeBench benchmark for evaluating LLM reasoning on financial decision-making using company fundamentals and trading signals.

Ax Huaide Jiang, Yash Chaudhary, Yuping Wang, Zehao Wang, Raghav Sharma, Manan Mehta, Yang Zhou, Lichao Sun, Zhiwen Fan, Zhengzhong Tu, Jiachen Li 3/20/2026

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

NavTrust benchmark evaluating robustness of embodied navigation agents (VLN and OGN) under real-world data corruptions.

Ax Reva Schwartz, Carina Westling, Morgan Briggs, Marzieh Fadaee, Isar Nejadgholi, Matthew Holmes, Fariza Rashid, Maya Carlyle, Afaf Ta\"ik, Kyra Wilson, Peter Douglas, Theodora Skeadas, Gabriella Waters, Rumman Chowdhury, Thiago Lacerda 3/20/2026

CIRCLE: A Framework for Evaluating AI from a Real-World Lens

CIRCLE lifecycle framework bridging gap between AI model metrics and real-world deployment outcomes through six-stage evaluation.

Ax Jakub Grudzien Kuba, Benjamin Kurt Miller, Sergey Levine, Pieter Abbeel 3/20/2026

Offline Materials Optimization with CliqueFlowmer

CliqueFlowmer approach for computational materials discovery using neural networks for offline optimization of material properties.

Ax Yulin Li, Tengyao Tu, Li Ding, Junjie Wang, Huiling Zhen, Yixin Chen, Yong Li, Zhuotao Tian 3/20/2026

Efficient Reasoning with Balanced Thinking

Method to reduce overthinking and underthinking in Large Reasoning Models through balanced token allocation for efficient inference.

Ax Ruijiang Gao, Steven Chong Xiao 3/20/2026

Nonstandard Errors in AI Agents

Study of nonstandard errors in AI coding agents deploying 150 Claude agents on market analysis tasks, showing agent-to-agent variation in analytical choices.

Ax Jillian Fisher, Shangbin Feng, Robert Aron, Thomas Richardson, Yejin Choi, Daniel W. Fisher, Jennifer Pan, Yulia Tsvetkov, Katharina Reinecke 3/20/2026

Biased AI can Influence Political Decision-Making

Experimental study measuring how partisan biases in LLMs influence human political opinions and decision-making.

Ax Alexandru Apostu, Silviu Gheorghe, Andrei H\^iji, Nicolae Cleju, Andrei P\u{a}tra\c{s}cu, Cristian Rusu, Radu Ionescu, Paul Irofti 3/20/2026

Detecting and Mitigating DDoS Attacks with AI: A Survey

Survey of AI-based detection and mitigation methods for DDoS attacks with taxonomy of attack categories.