Isolater - Feed

Ax Ziyin Zhang, Zihan Liao, Hang Yu, Peng Di, Rui Wang 3/20/2026

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

F2LLM-v2 multilingual embedding models (80M-14B parameters) supporting 200+ languages with emphasis on low-resource language coverage.

Ax Yogesh Agrawal (University of Central Florida), Aniruddha Dutta (University of Central Florida), Md Mahadi Hasan (University of Central Florida), Santu Karmaker (University of Central Florida), Aritra Dutta (University of Central Florida) 3/20/2026

FinTradeBench: A Financial Reasoning Benchmark for LLMs

FinTradeBench benchmark for evaluating LLM reasoning on financial decision-making using company fundamentals and trading signals.

Ax Huaide Jiang, Yash Chaudhary, Yuping Wang, Zehao Wang, Raghav Sharma, Manan Mehta, Yang Zhou, Lichao Sun, Zhiwen Fan, Zhengzhong Tu, Jiachen Li 3/20/2026

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

NavTrust benchmark evaluating robustness of embodied navigation agents (VLN and OGN) under real-world data corruptions.

Ax Rahul Patel, Elias B. Khalil, David Bergman 3/20/2026

Heuristic Multiobjective Discrete Optimization using Restricted Decision Diagrams

Heuristic methods for constructing restricted decision diagrams to approximate Pareto frontiers in multiobjective optimization.

Ax Ashlin Iser 3/20/2026

Automated Explanation Selection for Scientific Discovery

Framework combining machine learning with automated reasoning for generating and selecting explanations in scientific discovery tasks.

Ax Chang Yang, Xinrun Wang, Junzhe Jiang, Qinggang Zhang, Xiao Huang 3/20/2026

LLM-Based World Models Can Make Decisions Solely, But Rigorous Evaluations are Needed

Analysis of LLM-based world models for decision-making in reasoning systems, identifying evaluation gaps and methodological issues.

Ax Antoine Dolant, Praveen Kumar 3/20/2026

Agentic LLM Framework for Adaptive Decision Discourse

Framework using LLM agents to simulate decision discourse by representing diverse stakeholder perspectives in complex problem-solving.

Ax Minjie Shen, Yanshu Li, Lulu Chen, Zhichao Fan, Yanhang Li, Qikai Yang 3/20/2026

From Mind to Machine: The Rise of Manus AI as a Fully Autonomous Digital Agent

Manus AI general-purpose autonomous agent combining LLM reasoning with execution capabilities for complex end-to-end tasks.

Ax Mingfeng Fan, Jianan Zhou, Yifeng Zhang, Yaoxin Wu, Jinbiao Chen, Guillaume Adrien Sartoretti 3/20/2026

Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation

Deep reinforcement learning approach for multi-objective combinatorial optimization using conditional computation and preference decomposition.

Ax Jiaqi Chen, Mingfeng Fan, Xuefeng Zhang, Jingsong Liang, Yuhong Cao, Guohua Wu, Guillaume Adrien Sartoretti 3/20/2026

Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning

Multimodal learning framework for solving Generalized Traveling Salesman Problem in robotic task planning.

Ax Yifan Zhang 3/20/2026

Single Agent Robust Deep Reinforcement Learning for Bus Fleet Control

Single-agent reinforcement learning framework for bus fleet control addressing traffic stochasticity and demand variability.

Ax Xijia Tao, Yihua Teng, Xinxing Su, Xinyu Fu, Jihao Wu, Chaofan Tao, Ziru Liu, Haoli Bai, Rui Liu, Lingpeng Kong 3/20/2026

MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents

MMSearch-Plus benchmark for multimodal browsing agents requiring genuine vision-text reasoning and iterative retrieval verification.

Ax Jacqueline Maasch, John Kalantari, Kia Khezeli 3/20/2026

CausalARC: Abstract Reasoning with Causal World Models

CausalARC testbed for evaluating AI reasoning on abstract tasks with limited data and distribution shift using causal world models.

Ax Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary 3/20/2026

Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation

Bayesian evaluation framework replacing Pass@k metric for more stable and reliable LLM reasoning performance assessment.

Ax Ruolan Cheng, Yong Deng 3/20/2026

An Order-Sensitive Conflict Measure for Random Permutation Sets

Theoretical framework for measuring conflicts in random permutation sets using order-dependent uncertainty fusion.

Ax Arefeh Kazemi, Hamza Qadeer, Joachim Wagner, Hossein Hosseini, Sri Balaaji Natarajan Kalaivendan, Brian Davis 3/20/2026

SynBullying: A Multi LLM Synthetic Conversational Dataset for Cyberbullying Detection

SynBullying dataset uses multiple LLMs to generate synthetic conversational data for cyberbullying detection research.

Ax Yibin Wen, Qingmei Li, Zi Ye, Jiarui Zhang, Zurong Mai, Jing Wu, Shuohong Lou, Yuhang Chen, Henglian Huang, Xiaoya Fan, Yang Zhang, Defeng Gu, Lingyuan Zhao, Yutong Lu, Haohuan Fu, Jianxi Huang, Juepeng Zheng 3/20/2026

AgroCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture

AgroCoT benchmark evaluates reasoning capabilities of vision-language models for agricultural applications like crop monitoring and pest detection.

Ax Deliang Wen, Ke Sun 3/20/2026

Memory Bear AI A Breakthrough from Memory to Cognition Toward Artificial General Intelligence

Memory Bear system applies cognitive science principles to address LLM memory limitations, hallucinations, and context window constraints.

Ax Christopher A. McClurg, Alan R. Wagner 3/20/2026

Developing a Discrete-Event Simulator of School Shooter Behavior from VR Data

VR-based discrete-event simulator for school security evaluation using behavioral data.

Ax Philipp Schoenegger, Matt Carlson, Chris Schneider, Chris Daly 3/20/2026

Verifiable Semantics for Agent-to-Agent Communication

Certification protocol ensuring consistent semantic understanding between agents using stimulus-meaning model and empirical testing.

Ax Yucheng Shi, Ying Li, Yu Wang, Yesu Feng, Arjun Rao, Rein Houthooft, Shradha Sehgal, Jin Wang, Hao Zhen, Ninghao Liu, Linas Baltrunas 3/20/2026

From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation at Industry Scale

Data-centric framework learning optimal verbalization for converting user interaction logs into natural language for LLM-based recommendation systems.

Ax Heejin Jo 3/20/2026

Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem

Variable isolation study examining prompt architecture layers enabling LLMs to solve reasoning benchmarks like the car wash problem.

Ax Reva Schwartz, Carina Westling, Morgan Briggs, Marzieh Fadaee, Isar Nejadgholi, Matthew Holmes, Fariza Rashid, Maya Carlyle, Afaf Ta\"ik, Kyra Wilson, Peter Douglas, Theodora Skeadas, Gabriella Waters, Rumman Chowdhury, Thiago Lacerda 3/20/2026

CIRCLE: A Framework for Evaluating AI from a Real-World Lens

CIRCLE lifecycle framework bridging gap between AI model metrics and real-world deployment outcomes through six-stage evaluation.

Ax Jiangyu Chen 3/20/2026

AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

AI4S-SDS system combining LLM agents with sparse MCTS and differentiable physics for automated chemical solvent design.

Ax Jakub Grudzien Kuba, Benjamin Kurt Miller, Sergey Levine, Pieter Abbeel 3/20/2026

Offline Materials Optimization with CliqueFlowmer

CliqueFlowmer approach for computational materials discovery using neural networks for offline optimization of material properties.

Ax Yunfei Xie, Kevin Wang, Bobby Cheng, Jianzhu Yao, Zhizhou Sha, Alexander Duffy, Yihan Xi, Hongyuan Mei, Cheston Tan, Chen Wei, Pramod Viswanath, Zhangyang Wang 3/20/2026

MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games

MEMO framework reducing variance in multi-turn multi-agent LLM game evaluations through memory augmentation and context optimization.

Ax Yunhang Qian, Xiaobin Hu, Jiaquan Yu, Siyang Xin, Xiaokun Chen, Jiangning Zhang, Peng-Tao Jiang, Jiawei Liu, Hongwei Bran Li 3/20/2026

MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems

MedMASLab unified framework and benchmark for multimodal medical multi-agent systems with standardized integration and cross-specialty evaluation.

Ax Haihua Luo, Xuming Ran, Tommi K\"arkk\"ainen, Zhonghua Chen, Jiangrong Shen, Qi Xu, Fengyu Cong 3/20/2026

Reversible Lifelong Model Editing via Semantic Routing-Based LoRA

SoLA framework for reversible lifelong model editing in LLMs using semantic routing with LoRA modules to prevent knowledge forgetting.

Ax Yulin Li, Tengyao Tu, Li Ding, Junjie Wang, Huiling Zhen, Yixin Chen, Yong Li, Zhuotao Tian 3/20/2026

Efficient Reasoning with Balanced Thinking

Method to reduce overthinking and underthinking in Large Reasoning Models through balanced token allocation for efficient inference.

Ax Xuanyu Zhu, Yuhao Dong, Rundong Wang, Yang Shi, Zhipeng Wu, Yinlun Peng, YiFan Zhang, Yihang Lou, Yuanxing Zhang, Ziwei Liu, Yan Bai, Yuan Zhou 3/20/2026

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

VTC-Bench evaluating multimodal LLM agents on complex visual tool composition, addressing limitations in existing tool-use benchmarks.

Ax Jing Ye, Xinpei Zhao, Lu Xiang, Yaping Zhang, Chengqing Zong 3/20/2026

Listening to the Echo: User-Reaction Aware Policy Optimization via Scalar-Verbal Hybrid Reinforcement Learning

Hybrid scalar-verbal RL approach for emotional support dialogue systems using user reactions as learning signals instead of expert-defined rewards.

Ax Andrea Tupini, Lars Liden, Reuben Tan, Yu Wang, Jianfeng Gao 3/20/2026

AsgardBench -- Evaluating Visually Grounded Interactive Planning Under Minimal Feedback

AsgardBench benchmark for evaluating visually-grounded interactive planning and plan adaptation based on visual observations.

Ax Cosimo Spera 3/20/2026

Safety is Non-Compositional: A Formal Framework for Capability-Based AI Systems

Formal proof that safety is non-compositional when combining agents with conjunctive capability dependencies.

Ax Yu Li, Rui Miao, Zhengling Qi, Tian Lan 3/20/2026

ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning

ARISE hierarchical RL framework for mathematical reasoning in LLMs that learns reusable strategies across problem instances.

Ax Hugo Math 3/20/2026

Learning to Predict, Discover, and Reason in High-Dimensional Event Sequences

Machine learning approach for predicting and discovering error patterns in vehicle diagnostic trouble codes using temporal sequence analysis.

Ax Ruijiang Gao, Steven Chong Xiao 3/20/2026

Nonstandard Errors in AI Agents

Study of nonstandard errors in AI coding agents deploying 150 Claude agents on market analysis tasks, showing agent-to-agent variation in analytical choices.

Ax Yi Nian, Haosen Cao, Shenzhe Zhu, Henry Peng Zou, Qingqing Luan, Yue Zhao 3/20/2026

When Only the Final Text Survives: Implicit Execution Tracing for Multi-Agent Attribution

IET framework for attributing multi-agent system outputs to specific agents without execution logs, enabling accountability in agent interactions.

Ax Bin Zhang, Yuxiao Ye, Guoqing Du, Xiaoru Hu, Zhishuai Li, Chi Harold Liu, Zhiwei Xu, Guoliang Fan, Rui Zhao, Ziyue Li, Hangyu Mao 3/20/2026

SQLBench: A Comprehensive Evaluation for Text-to-SQL Capabilities of Large Language Models

SQLBench benchmark for evaluating Text-to-SQL capabilities of LLMs across sub-tasks, addressing gaps in prompt templates and performance assessment.

Ax Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Serbetar Karlo, Dong-Kyu Chae 3/20/2026

CADGL: Context-Aware Deep Graph Learning for Predicting Drug-Drug Interactions

Graph learning model for drug-drug interaction prediction addressing generalization and robustness in extreme cases.

Ax Aws Khalil, Jaerock Kwon 3/20/2026

PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles

Deep learning framework mitigating perception latency in vision-based lane-keeping for autonomous vehicles using imitation learning.

Ax Jillian Fisher, Shangbin Feng, Robert Aron, Thomas Richardson, Yejin Choi, Daniel W. Fisher, Jennifer Pan, Yulia Tsvetkov, Katharina Reinecke 3/20/2026

Biased AI can Influence Political Decision-Making

Experimental study measuring how partisan biases in LLMs influence human political opinions and decision-making.

Ax Jakub Grudzien Kuba, Pieter Abbeel, Sergey Levine 3/20/2026

Cliqueformer: Model-Based Optimization with Structured Transformers

Structured transformer approach for offline model-based optimization combining reinforcement learning and generative modeling for design problems.

Ax Yifan Zhang, Junhui Hou 3/20/2026

Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?

Framework addressing limitations of contrastive distillation for 3D representation learning by capturing modality-specific features.

Ax Zehao Chen, Rong Pan 3/20/2026

SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers

Autoregressive transformer approach for component-based colored SVG generation from text descriptions.

Ax Rasmus Aavang, Giovanni Rizzi, Rasmus B{\o}ggild, Alexandre Iolov, Mike Zhang, Johannes Bjerva 3/20/2026

HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings

Dataset for hierarchical KPI extraction from earnings filings using iXBRL structured financial documents.

Ax Xinxin Zhao, Xinmei Huang, Haoyang Li, Jing Zhang, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, Hong Chen 3/20/2026

LLMIA: An Out-of-the-Box Index Advisor via In-Context Learning with LLMs

LLM-based index advisor for database optimization using in-context learning to iteratively refine index recommendations.

Ax Mingyang Liu, Gabriele Farina, Asuman Ozdaglar 3/20/2026

Differentially Private Equilibrium Finding in Polymatrix Games

Equilibrium finding algorithms in polymatrix games under differential privacy constraints with hardness results.

Ax Alexandru Apostu, Silviu Gheorghe, Andrei H\^iji, Nicolae Cleju, Andrei P\u{a}tra\c{s}cu, Cristian Rusu, Radu Ionescu, Paul Irofti 3/20/2026

Detecting and Mitigating DDoS Attacks with AI: A Survey

Survey of AI-based detection and mitigation methods for DDoS attacks with taxonomy of attack categories.

Ax Lovedeep Gondara, Jonathan Simkin, Shebnum Devji, Gregory Arbour, Raymond Ng 3/20/2026

ELM: A Hybrid Ensemble of Language Models for Automated Tumor Group Classification in Population-Based Cancer Registries

Ensemble of language models for automated tumor group classification from unstructured pathology reports in cancer registries.

Ax Sindhuja Madabushi, Ahmad Faraz Khan, Haider Ali, Jin-Hee Cho 3/20/2026

OPUS-VFL: Incentivizing Optimal Privacy-Utility Tradeoffs in Vertical Federated Learning

Federated learning system balancing privacy-utility tradeoffs with incentive mechanisms and heterogeneous resource accommodation across organizations.