Isolater - Feed

Ax Sehun Kim 12d ago

Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture

Self-supervised learning approach for ECG signal representation using masked modeling from unlabeled medical data.

Ax Sajib Kumar Saha Joy, Arman Hassan Mahy, Meherin Sultana, Azizah Mamun Abha, MD Piyal Ahmmed, Yue Dong, G M Shahariar 12d ago

Mitigating Extrinsic Gender Bias for Bangla Classification Tasks

Investigation of gender bias in Bangla language models with benchmark datasets for sentiment analysis, toxicity detection, hate speech, and sarcasm.

Ax Yangyang Li, Daqing Liu, Wu Liu, Allen He, Xinchen Liu, Yongdong Zhang, Guoqing Jin 12d ago

OmniPrism: Learning Disentangled Visual Concept for Image Generation

Method for learning disentangled visual concepts in image generation to improve multi-aspect creative generation while reducing concept confusion.

Ax Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad, Peizhong Ju, A. B. Siddique 12d ago

Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution

Analysis of polysemanticity in LLMs revealing neurons exhibit multiple semantic meanings, challenging discrete neuron attribution for model interpretation.

Ax Jinghua Piao, Yuwei Yan, Jun Zhang, Nian Li, Junbo Yan, Xiaochong Lan, Zhihong Lu, Zhiheng Zheng, Jing Yi Wang, Di Zhou, Chen Gao, Fengli Xu, Fang Zhang, Ke Rong, Jun Su, Yong Li 12d ago

AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society

Research on large-scale simulation of LLM-driven generative agents for studying human behavior and social dynamics through computational approaches.

Ax Hao-Xiang Xu, Jun-Yu Ma, Zhen-Hua Ling, Ningyu Zhang, Jia-Chen Gu 12d ago

Constraining Sequential Model Editing with Editing Anchor Compression

Sequential model editing method with editing anchor compression to constrain parameter drift and maintain LLM general abilities during knowledge updates.

Ax Junhao Liu, Haonan Yu, Zhenyu Yan, Xin Zhang 12d ago

Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models

Budget-friendly proxy model framework for post-hoc interpretability of LLMs, enabling actionable explanations for prompt engineering and optimization.

Ax Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye 12d ago

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Autoregressive super-resolution framework decomposing extreme upsampling into intermediate scales with preference alignment for improved scalability.

Ax Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Yu-Lun Liu, Chih-Yu Wang, Jun-Cheng Chen 12d ago

Gen-n-Val: Agentic Image Data Generation and Validation

Agentic framework for synthetic image data generation and validation addressing data scarcity and label noise in vision tasks like detection and segmentation.

Ax Zhiyu Xue, Reza Abbasi-Asl, Ramtin Pedarsani 12d ago

Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations

Safety enhancement for medical vision-language models using synthetic demonstrations to improve rejection of harmful clinical queries.

Ax Alexander Gambashidze, Li Pengyi, Matvey Skripkin, Andrey Galichin, Anton Gusarov, Konstantin Sobolev, Andrey Kuznetsov, Ivan Oseledets 12d ago

Listener-Rewarded Thinking in VLMs for Image Preferences

Listener-rewarded thinking approach using reinforcement learning to train robust reward models for generative text-to-image and video models.

Ax Haoyu Zhang, Shihao Zhang, Ian Colbert, Rayan Saab 12d ago

Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos

Theoretical analysis providing quantitative guarantees for post-training quantization methods OPTQ and Qronos applied to LLMs and neural networks.

Ax Jianxiang He, Meisheng Hong, Jungang Li, Weiyu Guo, Xuming Hu, Hui Xiong 12d ago

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding

Keyframe selection method using visual subtitles for improved long video understanding with multimodal LLMs under context length constraints.

Ax Artzai Picon, Itziar Eguskiza, Daniel Mugica, Javier Romero, Carlos Javier Jimenez, Eric White, Gabriel Do-Lago-Junqueira, Christian Klukas, Ramon Navarra-Mestre 12d ago

Mitigating Domain Drift in Multi Species Segmentation with DINOv2: A Cross-Domain Evaluation in Herbicide Research Trials

DINOv2-based segmentation framework for plant species and damage detection in herbicide trials, addressing domain drift across real-world conditions.

Ax Sebastian Lubos, Alexander Felfernig, Damian Garber, Gerhard Leitner, Julian Schwazer, Manuel Henrich 12d ago

Investigating Multimodal Large Language Models to Support Usability Evaluation

Investigation of multimodal LLMs for automating usability evaluation of user interfaces by analyzing visual UI context and textual instructions.

Ax Chen Zeng, Tiehang Xu, Qiao Wang 12d ago

AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting

Kolmogorov-Arnold Network variant with autoregressive weights for time series forecasting, comparing performance against LLMs and ARIMA.

Ax Hao Chen, Tao Han, Jie Zhang, Song Guo, Lei Bai 12d ago

STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting

Spatial-temporal weather forecasting model with adaptive boundary alignment for improved global and regional predictions.

Ax Rongguang Ye, Ming Tang, Edith C. H. Ngai 12d ago

On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs

Configuration-aware LoRA adaptation for efficient fine-tuning of quantized LLMs on heterogeneous edge devices with privacy preservation.

Ax Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok 12d ago

Adaptive Planning for Multi-Attribute Controllable Summarization with Monte Carlo Tree Search

Monte Carlo Tree Search approach for multi-attribute controllable summarization without per-attribute fine-tuning, enabling flexible constraint satisfaction.

Ax Han Zhou, Jinjin Cao, Liyuan Ma, Xueji Fang, Guo-jun Qi 12d ago

Traj2Action: A Co-Denoising Framework for Trajectory-Guided Human-to-Robot Skill Transfer

Co-denoising framework for transferring manipulation skills from human videos to robots by bridging morphological differences.

Ax Danial Samadi Vahdati, Tai Duc Nguyen, Ekta Prashnani, Koki Nagano, David Luebke, Orazio Gallo, Matthew Stamm 12d ago

Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing

Security research on defending AI-based videoconferencing systems against pose-expression latent hijacking attacks using biometric detection.

Ax Zhepeng Cen, Haolin Chen, Shiyu Wang, Zuxin Liu, Zhiwei Liu, Jielin Qiu, Ding Zhao, Silvio Savarese, Caiming Xiong, Huan Wang, Weiran Yao 12d ago

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Automated pipeline for scaling reinforcement learning datasets to pretraining scale, addressing data bottleneck in RL for LLM training.

Ax Shaokai Wu, Yanbiao Ji, Qiuchang Li, Zhiyi Zhang, Qichen He, Wenyuan Xie, Guodong Zhang, Bayram Bayramli, Yue Ding, Hongtao Lu 12d ago

Dejavu: Towards Experience Feedback Learning for Embodied Intelligence

Post-deployment learning framework for Vision-Language-Action policies using retrieved execution memories to improve embodied agent performance.

Ax Yuquan Xue, Guanxing Lu, Zhenyu Wu, Chuanrui Zhang, Bofang Jia, Zhengyi Gu, Ziwei Wang 12d ago

RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation

Data augmentation framework for robotic manipulation using Vision-Language-Action models to improve learning from limited demonstration datasets.

Ax Thaweerath Phisannupawong, Joshua Julian Damanik, Han-Lim Choi 12d ago

LLM4Delay: Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation

LLM-based framework for predicting flight delays using textual aeronautical information and aircraft trajectory data for air traffic management.

Ax Taha Yasseri, Saeedeh Mohammadi 12d ago

How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison

Computational analysis comparing 17,790 articles between Grokipedia (AI-generated) and Wikipedia examining textual and structural biases.

Ax Seunghee Han, Yeonghun Kang, Taeun Bae, Junho Kim, Younghun Kim, Varinia Bernales, Alan Aspuru-Guzik, Jihan Kim 12d ago

EGMOF: Efficient Generation of Metal-Organic Frameworks Using a Hybrid Diffusion-Transformer Architecture

EGMOF: hybrid diffusion-transformer for metal-organic framework generation with inverse design capabilities for materials discovery.

Ax Dom\'icio Pereira Neto, Jo\~ao Correia, Penousal Machado 12d ago

Evolutionary Optimization Trumps Adam Optimization on Embedding Space Exploration

Inference-time optimization using evolutionary algorithms on prompt embeddings for diffusion model control without fine-tuning.

Ax Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Dinesh Manocha 12d ago

Structured Uncertainty guided Clarification for LLM Agents

Structured uncertainty framework for LLM agents with tool-calling to generate principled clarifying questions for ambiguous user instructions.

Ax Zhirui Liu, Kaiyang Ji, Ke Yang, Jingyi Yu, Ye Shi, Jingya Wang 12d ago

Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary

Language-conditioned humanoid robot control using LLM with unified motion vocabulary for free-form command execution and embodied AI.

Ax Anik De, Abhirama Subramanyam Penamakuri, Rajeev Yadav, Aditya Rathore, Harshiv Shah, Devesh Sharma, Sagar Agarwal, Pravin Kumar, Anand Mishra 12d ago

Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding

Bharat Scene Text dataset and benchmark for Indian language scene text recognition addressing script diversity and font variations.

Ax Le Thien Phuc Nguyen, Zhuoran Yu, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee 12d ago

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

AV-SpeakerBench: multimodal LLM benchmark with 3,212 questions evaluating audiovisual speech understanding and speaker-speech alignment in video.

Ax Haoming Liu, Jinnuo Liu, Yanhao Li, Liuyang Bai, Yunkai Ji, Yuanhe Guo, Shenji Wan, Hongyi Wen 12d ago

From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity

Analysis of flow-based diffusion models revealing two-stage behavior through oracle velocity field computation and memorization-generalization tradeoffs.

Ax Melane Navaratnarajah, David A. Kelly, Hana Chockler 12d ago

Out-of-the-box: Black-box Causal Attacks on Object Detectors

Research on adversarial perturbations for object detectors using black-box attacks to expose vulnerabilities and understand attack mechanisms.

Ax Zayne Sprague, Jack Lu, Manya Wadhwa, Sedrick Keh, Mengye Ren, Greg Durrett 12d ago

SkillFactory: Self-Distillation For Learning Cognitive Behaviors

Research on self-distillation methods for teaching language models to leverage cognitive skills like verification and backtracking without base model exposure.

Ax Thao Nguyen, Sicheng Mo, Krishna Kumar Singh, Yilin Wang, Jing Shi, Nicholas Kolkin, Eli Shechtman, Yong Jae Lee, Yuheng Li 12d ago

Relational Visual Similarity

Research on relational visual similarity in computer vision showing how humans perceive analogical relationships beyond attribute similarity.

Ax Qiushi Han, David Simchi-Levi, Renfei Tan, Zishuo Zhao 12d ago

Multi-agent Adaptive Mechanism Design

Framework combining mechanism design and online learning for sequential mechanism design where principal learns agent beliefs while ensuring truthfulness.

Ax Zibo Zhao (Arizona State University), Yuanting Zha (ShanghaiTech University), Haipeng Zhang (ShanghaiTech University), Xingcheng Xu (Shanghai Artificial Intelligence Laboratory) 12d ago

The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs

Mechanistic study of self-reflection emergence in RL-trained LLMs, proposing two-stage decision-sampling hypothesis to explain unified optimization producing distinct capabilities.

Ax Frank Mollard, Marcus Becker, Florian Roehrbein 12d ago

Adversarial Evasion Attacks on Computer Vision using SHAP Values

White-box adversarial attack method on computer vision models using SHAP values to generate imperceptible evasion attacks.

Ax Jianan Wang, Nailei Hei, Li He, Huanzhen Wang, Aoxing Li, Yingkai Zhao, Yuxuan Lin, Haofen Wang, Chunyang Wang, Yan Wang, Wenqiang Zhang 12d ago

Screen, Cache, and Match: A Training-Free Causality-Consistent Reference Frame Framework for Human Animation

Training-free framework for human video animation using cached reference frames to model long-range dependencies while preserving temporal coherence.

Ax Safal Shrestha, Anubhav Shrestha, Aadim Nepal, Minwu Kim, Keith Ross 12d ago

On the Limits of Layer Pruning for Generative Reasoning in Large Language Models

Analysis showing layer pruning of LLMs degrades generative reasoning tasks beyond surface degradation, causing loss of algorithmic capabilities.

Ax Bryan Sangwoo Kim, Jonghyun Park, Jong Chul Ye 12d ago

Tiled Prompts: Overcoming Prompt Misguidance in Image and Video Super-Resolution

Method addressing prompt misguidance in diffusion-based super-resolution by using tiled prompts for localized semantic guidance.

Ax Indraveni Chebolu, Arnab Mallick, Harmesh Rana 12d ago

SPEAR: An Engineering Case Study of Multi-Agent Coordination for Smart Contract Auditing

Multi-agent framework for smart contract auditing using specialized agents for planning, execution, and recovery with coordination protocols.

Ax Eun Cheol Choi, Lindsay E. Young, Emilio Ferrara 12d ago

Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

Study demonstrating LLM biases when simulating misinformation susceptibility, showing models overstate attitudes and ignore network effects present in humans.

Ax Prerna Ravi, Car\'umey Stevens, Beatriz Flamia Azevedo, Jasmine David, Brandon Hanks, Hal Abelson, Grace Lin, Emma Anderson 12d ago

Exploring Teachers' Perspectives on Using Conversational AI Agents for Group Collaboration

Qualitative study of 33 K12 teachers' perspectives on using conversational AI agents to scaffold group collaboration in classrooms.

Ax Adolfo Gonz\'alez, V\'ictor Parada 12d ago

An Adaptive Model Selection Framework for Demand Forecasting under Horizon-Induced Degradation to Support Business Strategy and Operations

Adaptive framework for demand forecasting model selection addressing horizon-induced performance degradation in inventory planning.

Ax Rong Fu, Zijian Zhang, Kun Liu, Jiekai Wu, Xianda Li, Simon Fong 12d ago

SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework

Pipeline combining subquadratic retrieval and GPU-accelerated kernels for analyzing immune repertoires at population scale.

Ax Joao Manoel Herrera Pinheiro, Gabriela Do Nascimento Herrera, Luciana Bueno Dos Reis Fernandes, Alvaro Doria Dos Santos, Ricardo V. Godoy, Eduardo A. B. Almeida, Helena Carolina Onody, Marcelo Andrade Da Costa Vieira, Angelica Maria Penteado-Dias, Marcelo Becker 12d ago