Isolater - Feed

HN appsecsanta 11d ago

The Rise of AI Pentesting Agents: A Technical Analysis (2026)

Technical analysis of AI pentesting agents evolution from PentestGPT to autonomous agents like PentAGI and XBOW.

HN aituglo 11d ago

The state of bug bounty in 2026

Essay on bug bounty trends in 2026. Discusses AI agent effectiveness for vulnerability discovery and program management challenges.

HN vanardev 11d ago

XBPP – Open standard for governing AI agent payments (Apache 2.0)

Apache 2.0 open standard for governing AI agent payment requests. Policy engine with 12 configurable checks for payment authorization.

HN atulanand94 11d ago

Open source 1040 tax software built by AI agents

Open-source tax software built and maintained by autonomous AI agents. Uses IRS publications as source, applies self-improving agent loops.

HN dev_tools_lab 11d ago

The Star Chamber: Why Multi-LLM Consensus Is Now a Necessity for Code Quality

Tool for multi-LLM code review consensus. Aggregates feedback from multiple models to identify blind spots and improve code quality assessment.

HN jonadas 11d ago

Beyond Karpathy's LLM-Wiki: The Necessity of Cognitive Governance

Essay on LLM-based knowledge management limitations. Discusses problems with AI-generated note synthesis and cognitive organization.

HN hpbyte 11d ago

Show HN: Rocky-Project Hail Mary agent skill that cut output tokens ~47%

Agent skill implementation for token compression. Reduces output tokens by ~47% while maintaining readability.

HN AkshatVirmani 11d ago

State of API Security 2026: An AI-Native Testing Perspective

Security report on 1.4M AI-driven API test executions. Maps vulnerabilities to OWASP Top 10 using agentic testing.

BL 11d ago

Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI

Cloudflare expands access to OpenAI's frontier models via Agent Cloud platform, enabling enterprises to deploy AI agents for customer support, system updates, and report generation.

Ax Yousra Fettach, Guillaume Bied, Hannu Toivonen, Tijl De Bie 12d ago

Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models

Benchmark evaluating humor alignment across frontier LLMs using Cards Against Humanity gameplay, analyzing model performance vs human baseline on comedic response selection.

Ax Zhuoyi Yang, Jiapeng Yu, Reuben Tan, Boyang Li, Huijuan Xu 12d ago

InstrAct: Towards Action-Centric Understanding in Instructional Videos

InstrAction pretraining framework for video foundation models to improve action recognition in instructional videos by addressing static bias in temporal understanding.

Ax Arda Atalik, Hui Xue, Rhodri H. Davies, Thomas A. Treibel, Daniel K. Sodickson, Michael S. Hansen, Peter Kellman 12d ago

PSIRNet: Deep Learning-based Free-breathing Rapid Acquisition Late Enhancement Imaging

Deep learning method for cardiac MRI imaging using phase-sensitive inversion recovery to reduce acquisition time and motion artifacts in late gadolinium enhancement scans.

Ax Mahdi Alizadeh 12d ago

eBandit: Kernel-Driven Reinforcement Learning for Adaptive Video Streaming

eBandit uses eBPF and multi-armed bandit reinforcement learning in Linux kernel for adaptive video bitrate selection with improved network signal visibility.

Ax Sophie Wu, Andrew Piper 12d ago

Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation

Evaluates cultural alignment of LLMs across 14 language-culture pairs using multilingual story moral generation task and dataset.

Ax Marc B\"ohlen, Sai Krishna 12d ago

Scrapyard AI

Investigates opportunities for resource-constrained AI research using obsolete yet capable discarded models from AI production cycles.

Ax Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson, Myles Foley, Ed Chapman, Nickolas Espinosa Dice, Ankita Samaddar, Joshua Sylvester, Himanshu Neema, Nicholas Butts, Nate Foster, Ahmad Ridley, Zoe M, Paul Jones 12d ago

Building Better Environments for Autonomous Cyber Defence

Workshop report on designing reinforcement learning environments for autonomous cyber defense applications.

Ax Fatih Cagatay Akyon, Alptekin Temizel 12d ago

SenBen: Sensitive Scene Graphs for Explainable Content Moderation

SenBen large-scale scene graph benchmark for explainable content moderation with visual grounding and sensitivity annotations.

Ax Mehran Taghian, Yunke Peng, Xing Huang, Yao Wang, Yaoyuan Wang, Wei Guo, Yuanyong Luo, Tianchi Hu, Junsong Wang, Xin Wang, Hu Liu, Yu Cheng, Ziwei Yu, Hongliang Li, Mehdi Rahimifar, Lei Yan, Xuefei Wang, Zhuang Ma, Lei Liu, Hui Yu, Anandharaju Durai Raju, Hoang Le, Hei Yi Mak, Tanzila Rahman, Shadan Golestan 12d ago

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

HiFloat4 low-precision floating-point format for efficient 4-bit LLM pre-training on Ascend NPU hardware.

Ax Jinqi Luo, Jinyu Yang, Tal Neiman, Lei Fan, Bing Yin, Son Tran, Mubarak Shah, Ren\'e Vidal 12d ago

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs

Dictionary-aligned concept control method for safeguarding multimodal LLMs against malicious queries at inference time.

Ax Cyrus Zhou, Yufei Jin, Yilin Xu, Yu-Chiang Wang, Chieh-Ju Chao, Monica S. Lam 12d ago

Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching

Constraint-satisfaction-based retrieval system for matching patient profiles to clinical trials with high recall and precision.

Ax Greg Nyilasy, Brock Bastian, Jennifer Overbeck, Abraham Ryan Ade Putra Hito 12d ago

AI-Induced Human Responsibility (AIHR) in AI-Human teams

Empirical study on how humans allocate responsibility in AI-human hybrid workflows using AI-assisted lending experiments.

Ax Mintong Kang, Chen Fang, Bo Li 12d ago

AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models

AudioGuard framework for comprehensive audio safety protection including voice impersonation, speaker attributes, and compositional harms.

Ax Mohammed Maaz Sibhai, Abedalrhman Alkhateeb, Saad B. Ahmed 12d ago

MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification

MedFormer-UR transformer with uncertainty quantification for safe medical image classification in clinical settings.

Ax Rafael da Silva, Jeff Eicher, Gregory Longo 12d ago

Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations

Survival-oriented benchmark for temporal student dropout risk modeling using Open University Learning Analytics Dataset.

Ax Rafael da Silva, Jeff Eicher, Gregory Longo 12d ago

A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout

Temporal survival modeling framework for predicting student dropout using LMS engagement data and administrative records.

Ax Tokio Kajitsuka, Ukyo Honda, Sho Takase 12d ago

Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective

Re-examines capacity gap in chain-of-thought distillation, finding student models often outperform teacher distillation baselines.

Ax Chengjie Fan, Cong Pan, Zijian Liu, Ningzhong Liu, Jie Qin 12d ago

HTNav: A Hybrid Navigation Framework with Tiered Structure for Urban Aerial Vision-and-Language Navigation

HTNav framework for aerial vision-and-language navigation combining visual perception with language instructions in urban environments.

Ax Xinyu Zhang, Zurong Mai, Qingmei Li, Zjin Liao, Yibin Wen, Yuhang Chen, Xiaoya Fan, Chan Tsz Ho, Bi Tianyuan, Haoyuan Liang, Ruifeng Su, Zihao Qian, Juepeng Zheng, Jianxi Huang, Yutong Lu, Haohuan Fu 12d ago

HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

HM-Bench benchmark evaluates multimodal LLMs on hyperspectral remote sensing image understanding tasks.

Ax Hang Gao, Kunyu Li, Huang Hong, Baoquan Cui, Fengge Wu 12d ago

A Closer Look at the Application of Causal Inference in Graph Representation Learning

Analysis of causal inference methods applied to graph representation learning and their limitations with graph-structured data.

Ax Mohsen Yaghoubi Suraki 12d ago

Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS)

ADRUwAMS deep learning model with attention mechanisms for automated brain tumor glioma segmentation in medical imaging.

Ax Zecheng Hao, Shenghao Xie, Kang Chen, Wenxuan Liu, Zhaofei Yu, Tiejun Huang 12d ago

Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer

Ge2mS-T improves energy efficiency in Spiking Vision Transformers through multi-dimensional grouping and optimized training methods.

Ax Yuanting Fan, Jun Liu, Bin-Bin Gao, Xiaochen Chen, Yuhuan Lin, Zhewei Dai, Jiawei Zhan, Chengjie Wang 12d ago

Large-Scale Universal Defect Generation: Foundation Models and Datasets

UDG dataset with 300K samples for training defect/anomaly generation models with improved generalization across defect categories.

Ax Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo 12d ago

Beyond Relevance: Utility-Centric Retrieval in the LLM Era

RAG systems should optimize for utility (task completion) rather than topical relevance when retrieving documents for LLMs.

Ax Rares-Alexandru Roscan, Gabriel Petre1, Adrian-Marius Dumitran, Angela-Liliana Dumitran 12d ago

MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator

MuTSE: Human-in-the-loop evaluator tool for systematically comparing LLM text simplification outputs across different prompting strategies and architectures.

Ax Mintae Kim, Koushil Sreenath 12d ago

WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning

WOMBET: Framework for reinforcement learning that generates and transfers experience data between source and target robotic tasks for sample efficiency.

Ax Keyu Li, Jin Gao, Dequan Wang 12d ago

Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems

Aligned Agents, Biased Swarm: Empirical study measuring how multi-agent system topologies and feedback loops amplify bias in emergent behaviors.

Ax Avni Mittal, Shanu Kumar, Sandipan Dandapat, Monojit Choudhury 12d ago

Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models

Litmus ReAgent: Benchmark and agentic system for evaluating multilingual LLM performance prediction across 1,500 questions spanning six tasks and five evidence scenarios.

Ax Yi Luo, Xu Sun, Guangchun Luo, Aiguo Chen 12d ago

Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning

Neighbourhood Transformer: Graph neural network architecture using switchable attention to handle heterophilic graph learning where dissimilar nodes are frequently connected.

Ax Jihwan Oh, Soowon Oh, Murad Aghazada, Minchan Jeong, Sungnyun Kim, Se-Young Yun 12d ago

PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

PerMix-RLVR: Training method for aligning LLM personas with reward models while preserving output diversity, avoiding inference-time computation overhead.

Ax Zhiyu Zhou, Peilin Liu, Ruoxuan Zhang, Luyang Zhang, Cheng Zhang, Hongxia Xie, Wen-Huang Cheng 12d ago

PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos

PinpointQA dataset and benchmark for evaluating small object localization and spatial reasoning in video MLLMs.

Ax Xiaoke Guo, Songze Li, Zhiqiang Liu, Zhaoyan Gong, Yuanxiang Liu, Huajun Chen, Wen Zhang 12d ago

ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering

ASTRA: adaptive semantic tree reasoning architecture for LLM-based complex table question answering.

Ax Wenxi Li, Xihao Wang, Weiwei Sun 12d ago

Towards Linguistically-informed Representations for English as a Second or Foreign Language: Review, Construction and Application

Survey and construction of linguistically-informed representations for English as a second/foreign language.

Ax Carlos Jimeno Miguel, Raul Orduna, Francesco Zola 12d ago

Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering Detection

Named entity identification and anonymization system for cybercrime datasets using speech-to-text and image processing.

Ax Andre Bacellar 12d ago