TY - GEN
T1 - TEXT-IMAGE DE-CONTEXTUALIZATION DETECTION USING VISION-LANGUAGE MODELS
AU - Huang, Mingzhen
AU - Jia, Shan
AU - Chang, Ming Ching
AU - Lyu, Siwei
N1 - Publisher Copyright:
© 2022 IEEE
PY - 2022
Y1 - 2022
N2 - Text-image de-contextualization, which uses inconsistent image-text pairs, is an emerging form of misinformation and drawing increasing attention due to the great threat to information authenticity. With real content but semantic mismatch in multiple modalities, the detection of de-contextualization is a challenging problem in media forensics. Inspired by the recent advances in vision-language models with powerful relationship learning between images and texts, we leverage the vision-language models to the media de-contextualization detection task. Two popular models, namely CLIP and VinVL, are evaluated and compared on several news and social media datasets to show their performance in detecting image-text inconsistency in de-contextualization. We also summarize interesting observations and shed lights to the use of vision-language models in de-contextualization detection.
AB - Text-image de-contextualization, which uses inconsistent image-text pairs, is an emerging form of misinformation and drawing increasing attention due to the great threat to information authenticity. With real content but semantic mismatch in multiple modalities, the detection of de-contextualization is a challenging problem in media forensics. Inspired by the recent advances in vision-language models with powerful relationship learning between images and texts, we leverage the vision-language models to the media de-contextualization detection task. Two popular models, namely CLIP and VinVL, are evaluated and compared on several news and social media datasets to show their performance in detecting image-text inconsistency in de-contextualization. We also summarize interesting observations and shed lights to the use of vision-language models in de-contextualization detection.
KW - de-contextualization
KW - online misinformation
KW - out-of-text detection
KW - text-image inconsistency
UR - https://www.scopus.com/pages/publications/85131251006
U2 - 10.1109/ICASSP43922.2022.9746193
DO - 10.1109/ICASSP43922.2022.9746193
M3 - Conference contribution
AN - SCOPUS:85131251006
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 8967
EP - 8971
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022
Y2 - 22 May 2022 through 27 May 2022
ER -