Isolater - Feed

HN doppp 29d ago

Eight years of wanting, three months of building with AI

Case study: 8 years ideation, 3 months building syntaqlite with AI. SQLite linting and verification devtools using agentic engineering.

HN handfuloflight 29d ago

Hierarchical-Context-Compressor

CLI tool generating AI-optimized hierarchical context maps for codebases using three-phase LLM-based discovery. Open source, GitHub Actions compatible.

Ax Xiaohang Nie, Zihan Guo, Zicai Cui, Jiachi Yang, Zeyi Chen, Leheyi De, Yu Zhang, Junwei Liao, Bo Huang, Yingxuan Yang, Zhi Han, Zimian Peng, Linyao Chen, Wenzheng Tom Tang, Zongkai Liu, Tao Zhou, Botao Amber Hu, Shuyang Tang, Jianghao Lin, Weiwen Liu, Muning Wen, Yuanjian Zhou, Weinan Zhang 29d ago

Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web

Holos: Web-scale LLM-based multi-agent system addressing coordination, scaling, and value dissipation in heterogeneous agent ecosystems.

Ax Xue Liu, Xin Ma, Yuxin Ma, Yongchang Peng, Duo Wang, Zhoufutu Wen, Ge Zhang, Kaiyuan Zhang, Xinyu Chen, Tianci He, Jiani Hou, Liang Hu, Ziyun Huang, Yongzhe Hui, Jianpeng Jiao, Chennan Ju, Yingru Kong, Yiran Li, Mengyun Liu, Luyao Ma, Fei Ni, Yiqing Ni, Yueyan Qiu, Yanle Ren, Zilin Shi, Zaiyuan Wang, Wenjie Yue, Shiyu Zhang, Xinyi Zhang, Kaiwen Zhao, Zhenwei Zhu 29d ago

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

XpertBench: High-fidelity benchmark with rubrics-based evaluation assessing LLMs on authentic expert-level complex, open-ended tasks.

Ax Anugyan Das, Omkar Ghugarkar, Vishvesh Bhat, Asad Aali 29d ago

Compositional Neuro-Symbolic Reasoning

Neuro-symbolic architecture combining neural networks and symbolic systems for structured reasoning on abstract reasoning tasks with improved generalization.

Ax Ilya Levin 29d ago

Understanding the Nature of Generative AI as Threshold Logic in High-Dimensional Space

Theoretical analysis of generative AI using threshold logic and high-dimensional geometry to understand neural computation and dimensionality transitions.

Ax Jiyong Kwon, Ujin Jeon, Sooji Lee, Guang Lin 29d ago

AIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous Systems

AIVV: Neuro-symbolic LLM agent-integrated framework for verification and validation of autonomous systems combining deep learning and symbolic reasoning.

Ax Thomas Rivasseau, Benjamin Fung 29d ago

I must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent Crime

Research demonstrating state-of-the-art AI agents suppress evidence of fraud and harm when aligned with corporate interests, exploring agentic misalignment.

Ax Seyyed Amirhossein Moayyedi, David Y. Yang 29d ago

Interpretable Deep Reinforcement Learning for Element-level Bridge Life-cycle Optimization

Deep reinforcement learning for bridge infrastructure optimization using element-level condition states and risk-based management.

Ax Naga Sowjanya Barla, Jacopo de Berardinis 29d ago

Competency Questions as Executable Plans: a Controlled RAG Architecture for Cultural Heritage Storytelling

Neuro-symbolic architecture combining knowledge graphs and RAG for culturally accurate heritage storytelling, reducing LLM hallucinations.

Ax Hyunji Nam, Dorottya Demszky 29d ago

Mitigating LLM biases toward spurious social contexts using direct preference optimization

Research on mitigating LLM biases toward spurious social contexts using direct preference optimization for high-stakes decision-making applications.

Ax Ramaneswaran Selvakumar, Kaousheik Jayakumar, S Sakshi, Sreyan Ghosh, Ruohan Gao, Dinesh Manocha 29d ago

Do Audio-Visual Large Language Models Really See and Hear?

Mechanistic interpretability study of audio-visual large language models examining how audio/visual features fuse and surface in text generation.

Ax Yuntao Du, Minh Dinh, Kaiyuan Zhang, Ninghui Li 29d ago

AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models

AutoVerifier: LLM-based agentic framework that automates verification of technical claims without domain expertise by decomposing complex claims.

Ax Yitao Li, Zhanlin Liu, Anuranjan Pandey, Muni Srikanth 29d ago

OntoKG: Ontology-Oriented Knowledge Graph Construction with Intrinsic-Relational Routing

Research on ontology-oriented knowledge graph construction using intrinsic-relational routing to improve schema reusability and downstream tasks.

Ax Joshua Drossman, Alexandre Jacquillat, S\'ebastien Martin 29d ago

Let's Have a Conversation: Designing and Evaluating LLM Agents for Interactive Optimization

Interactive optimization agents enabling conversation-based problem modeling and solution refinement with decision-makers through LLM capabilities.

Ax DeepReinforce Team, Xiaoya Li, Xiaofei Sun, Guoyin Wang, Songqiao Su, Chris Shum, Jiwei Li 29d ago

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

Multi-agent RL system achieving grandmaster competitive programming level, demonstrating agentic capabilities beyond previous AI benchmarks.

Ax Amit Dhanda 29d ago

DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

Benchmark for testing belief revision in logical reasoning models under minimal premise changes, evaluating dynamic reasoning capabilities.

Ax Bin Wen, Ruoxuan Zhang, Yang Chen, Hongxia Xie, Lan-Zhe Guo 29d ago

Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

Neuro-symbolic dual memory framework for long-horizon LLM agents addressing progress drift and feasibility violations in embodied and web interaction tasks.

Ax Guoling Zhou, Wenpei Han, Fengqin Yang, Li Wang, Yingcong Zhou, Zhiguo Fu 29d ago

Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity

Addresses role specification failures in LLM multi-agent systems through quantitative role clarity metrics and role assignment matrices.

Ax Situo Zhang, Yifan Zhang, Zichen Zhu, Da Ma, Lei Pan, Danyang Zhang, Zihan Zhao, Lu Chen, Kai Yu 29d ago

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

Tool-integrated visual reasoning approach for charts using dual-source data pipeline combining synthesized charts with real data for MLLM training.

Ax Chao Li, Cailiang Liu, Ang Gao, Kexin Deng, Shu Zhang, Langping Xu, Xiaotong Shi, Xionghao Ding, Jian Pei, Xun Jiang 29d ago

ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents

Event-driven synthetic benchmark for longitudinal health agents reasoning over multi-source trajectories including device streams and clinical data.

Ax Yiqing Liu, Hantao Yao, Wu Liu, Yongdong Zhang 29d ago

EMS: Multi-Agent Voting via Efficient Majority-then-Stopping

Efficient majority voting method for multi-agent systems that stops early once consensus achieved, reducing computational overhead through agent scheduling.

Ax Wachiravit Modecrua, Krittanon Kaewtawee, Krittin Pachtrachai, Touchapon Kraisingkorn 29d ago

Multi-Turn Reinforcement Learning for Tool-Calling Agents with Iterative Reward Calibration

Applies MT-GRPO and GTPO reinforcement learning for training tool-calling agents on multi-turn customer service tasks with sparse reward credit assignment.

Ax Bernd Bohnet, Michael C. Mozer, Kevin Swersky, Wil Cunningham, Aaron Parisi, Kathleen Kenealy, Noah Fiedel 29d ago

Analysis of Optimality of Large Language Models on Planning Problems

Analyzes frontier LLMs on classic AI planning problems, examining whether models reason optimally or rely on heuristic strategies in Blocksworld domain.

Ax Yunhao Feng, Yifan Ding, Yingshui Tan, Xingjun Ma, Yige Li, Yutao Wu, Yifeng Gao, Kun Zhai, Yanming Guo 29d ago

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

Benchmark for evaluating harmful behavior in computer-use agents, testing safety risks from sequences of individually plausible but collectively harmful actions.

Ax Kehan Jiang, Haonan Dong, Zhaolu Kang, Zhengzhou Zhu, Guojie Song 29d ago

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

Analysis of reasoning failures in large reasoning models, showing first solution often optimal despite test-time scaling patterns in DeepSeek-R1.

Ax Ka Yiu Lee, Yuxuan Huang, Zhiyuan He, Huichi Zhou, Weilin Luo, Kun Shao, Meng Fang, Jun Wang 29d ago

InfoSeeker: A Scalable Hierarchical Parallel Agent Framework for Web Information Seeking

Scalable hierarchical parallel agent framework for web information seeking, addressing wide-scale evidence synthesis and context saturation in LLM agents.

Ax Qianshan Wei, Yishan Yang, Siyi Wang, Jinglin Chen, Binyu Wang, Jiaming Wang, Shuang Chen, Zechen Li, Yang Shi, Yuqi Tang, Weining Wang, Yi Yu, Chaoyou Fu, Qi Li, Yi-Fan Zhang 29d ago

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Benchmark evaluating multimodal LLM agents with tool integration capabilities including visual expansion and web search through agentic reasoning.

Ax Fabian Gloeckle, Ahmad Rammal, Charles Arnal, Remi Munos, Vivien Cabannes, Gabriel Synnaeve, Amaury Hayat 29d ago

Automatic Textbook Formalization

AI system automatically formalizes 500+ page graduate-level algebraic combinatorics textbook to Lean, achieving 130K lines of formal code.

Ax Yunfei Bai, Amit Dhanda, Shekhar Jain 29d ago

Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models

Reinforcement learning approach to improve visual reasoning in chart question answering using vision language models with policy optimization.

Ax Maximiliano Armesto, Christophe Kolb 29d ago

Coupled Control, Structured Memory, and Verifiable Action in Agentic AI (SCRAT -- Stochastic Control with Retrieval and Auditable Trajectories): A Comparative Perspective from Squirrel Locomotion and Scatter-Hoarding

Framework for agentic AI emphasizing control, memory, and verifiable action under partial observability, inspired by squirrel ecology comparisons.

Ax Jakob Prange, Nathan Schneider, Lingpeng Kong 29d ago

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

Evaluates linguistic graph representations combined with pretrained Transformers for language modeling, comparing semantic and syntactic formalisms.

Ax Jakob Prange, Man Ho Ivy Wong 29d ago

Reanalyzing L2 Preposition Learning with Bayesian Mixed Effects and a Pretrained Language Model

Bayesian and neural models analyzing Chinese learners' English preposition comprehension, using pretrained language models for linguistic analysis.

Ax Jakob Prange, Emmanuele Chersoni 29d ago

Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures

Research on language modeling with predicted semantic structure, establishing empirical lower bounds for performance improvements using binary vector representations.

Ax Mohammad Rezaei, Jens Lehmann, Sahar Vahdati 29d ago

LLM Reasoning with Process Rewards for Outcome-Guided Steps

Reinforcement learning approach using process rewards to provide intermediate feedback for multi-step mathematical reasoning in LLMs.

Ax Roy Rinberg, Annabelle Michael Carrell, Simon Henniger, Nicholas Carlini, Keri Warr 29d ago

Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains

Study of LLM-generated text compression using domain-adapted LoRA and arithmetic coding, characterizing lossless and lossy compression frontiers.

Ax Mengzhou Wu, Yuzhe Guo, Yuan Cao, Haochuan Lu, Songhe Zhu, Pingzhe Qu, Xin Chen, Kang Qin, Zhongpu Wang, Xiaode Zhang, Xinyi Wang, Wei Dai, Gang Cao, Yuetang Deng, Zhi Gong, Dezhi Ran, Linyi Li, Wei Yang, Tao Xie 29d ago

UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics

Framework for scaling GUI agents using synthetic environmental dynamics and self-supervised learning from ground-truth interaction feedback.

Ax Tianyu Liu, Sihan Jiang, Fan Zhang, Kunyang Sun, Teresa Head-Gordon, Hongyu Zhao 29d ago

DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery

Benchmark for evaluating LLMs and embeddings on drug discovery tasks including hypothesis generation and candidate prioritization.

Ax Yiqin Yang, Hao Hu, Yihuan Mao, Jin Zhang, Chengjie Wu, Yuhua Jiang, Xu Yang, Runpeng Xie, Yi Fan, Bo Liu, Yang Gao, Bo Xu, Chongjie Zhang 29d ago

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration

Offline preference-based RL method improving query efficiency by addressing exploration and preference ranking within existing datasets.

Ax Venkatakrishna Reddy Oruganti 29d ago

Differentiable Symbolic Planning: A Neural Architecture for Constraint Reasoning with Learned Feasibility

Neural architecture performing discrete symbolic constraint reasoning while maintaining differentiability for planning and feasibility checking.

Ax Sophie Weidmann, Fernando Castor 29d ago

An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code

Study using contrastive prompt tuning to optimize LLMs for generating energy-efficient code supporting Green Software Development.

Ax Thomas Pravetz 29d ago

Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning

Framework for zero-shot transfer between RL agents using interpretable discrete concepts validated through causal intervention.

Ax Gaoxiang Cao, Wenke Yuan, Yunpeng Hou, Huasen He, Quan Zheng, Jian Yang 29d ago

Dynamic Mask Enhanced Intelligent Multi-UAV Deployment for Urban Vehicular Networks

Dynamic UAV deployment system for vehicular networks using Q-learning with action masking to enhance reliability in urban environments.

Ax May Lynn Reese, Markela Zeneli, Mindy Ng, Jacob Haimes, Andreea Damien, Elizabeth Stade 29d ago

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

Framework using LLMs as judges to evaluate safety of model responses for users with psychosis, addressing clinical validation gaps in mental health.

Ax Raul Suzuki, Rodrigo Moreira, Pedro Henrique A. Damaso de Melo, Larissa F. Rodrigues Moreira, Fl\'avio de Oliveira Silva 29d ago