Skip to main navigation Skip to search Skip to main content

NAPReg: Nouns As Proxies Regularization for Semantically Aware Cross-Modal Embeddings

  • SUNY Buffalo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Cross-modal retrieval is a fundamental vision-language task with a broad range of practical applications. Text-to-image matching is the most common form of cross-modal retrieval where, given a large database of images and a textual query, the task is to retrieve the most relevant set of images. Existing methods utilize dual encoders with an attention mechanism and a ranking loss for learning embeddings that can be used for retrieval based on cosine similarity. Despite the fact that these methods attempt to perform semantic alignment across visual regions and textual words using tailored attention mechanisms, there is no explicit supervision from the training objective to enforce such alignment. To address this, we propose NAPReg, a novel regularization formulation that projects high-level semantic entities i.e Nouns into the embedding space as shared learnable proxies. We show that using such a formulation allows the attention mechanism to learn better word-region alignment while also utilizing region information from other samples to build a more generalized latent representation for semantic concepts. Experiments on three benchmark datasets i.e. MS-COCO, Flickr30k and Flickr8k demonstrate that our method achieves state-of-the-art results in cross-modal metric learning for text-image and image-text retrieval tasks. Code: https://github.com/bhavinjawade/NAPReq

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1135-1144
Number of pages10
ISBN (Electronic)9781665493468
DOIs
StatePublished - 2023
Event23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023 - Waikoloa, United States
Duration: Jan 3 2023Jan 7 2023

Publication series

NameProceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023

Conference

Conference23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023
Country/TerritoryUnited States
CityWaikoloa
Period01/3/2301/7/23

Keywords

  • Algorithms: Vision + language and/or other modalities
  • and algorithms (including transfer, low-shot, semi-, self-, and un-supervised learning)
  • formulations
  • Image recognition and understanding (object detection, categorization, segmentation, scene modeling, visual reasoning)
  • Machine learning architectures

Fingerprint

Dive into the research topics of 'NAPReg: Nouns As Proxies Regularization for Semantically Aware Cross-Modal Embeddings'. Together they form a unique fingerprint.

Cite this