Text Over Image: Auditing Multimodal Robustness in Synthetic Medical Image Detection

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

With the rapid adoption of generative AI, synthetic medical images pose growing risks, including diagnostic deception and insurance fraud.

Although prior work has explored vision-language model (VLM)-based synthetic image detection, these evaluations typically consider images in isolation.

In clinical practice, however, images are interpreted alongside structured records and metadata, and VLMs are increasingly deployed under joint image-record inputs.

We uncover a previously underexamined multimodal vulnerability: when given both modalities, VLMs may overweight record context in authenticity judgments, such that the same image receives different predictions solely due to changes in its accompanying text.

This raises concerns about robustness in real-world deployment.

To systematically characterize this effect, we reformulate synthetic medical image detection as an audit of multimodal robustness at the image-record interface and introduce a paired benchmark that holds the image fixed while swapping controlled metadata variants.

Across multiple imaging modalities, we evaluate diverse open-weight and frontier API VLMs and find that changing the metadata context alone can flip authenticity judgments, with accuracy on authentic images dropping by 61.1% on average under an explicit AI-origin tag.

We further propose an inference-time mitigation pipeline that detects and neutralizes provenance shortcuts without model retraining, substantially outperforming direct prompt-based suppression on the affected subset.

Our benchmark provides a standardized tool for assessing and improving multimodal robustness beyond image-only settings.

Code and data will be released upon acceptance.

전문 보기

Text Over Image: Auditing Multimodal Robustness in Synthetic Medical Image Detection

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Constructive Alignment: Governing Preference Dynamics in Human-AI Interaction

Bounded Morality: Defining the Space of Moral Computation

The MMM Data Model -- A Normative Specification for Knowledge Interoperability in a Decentralisable Knowledge Commons

arXiv의 다른 기사

RareDxR1: Autonomous Medical Reasoning for Rare Disease Diagnosis Beyond Human Annotation

A Contextual-Bandit Oversight Game with Two-Sided Informational Asymmetry

Constructing Epistemic AI Literacy: Detecting Epistemic Aims and Processes in Student-AI Co-Programming