Skip to main navigation Skip to search Skip to main content

PLVM: A Tuning-Free Approach for Personalized Large Vision-Language Model

  • SUNY Buffalo
  • New York University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The personalization model has gained significant attention in image generation yet remains underexplored for large vision-language models (LVLMs). Beyond generic ones, with personalization, LVLMs handle interactive dialogues using referential concepts (e.g., 'Mike and Susan are talking.') instead of the generic form (e.g., 'a boy and a girl are talking.'), making the conversation more customizable and referentially friendly. In addition, PLVM is equipped to continuously add new concepts during a dialogue without incurring additional costs, which significantly enhances the practicality. PLVM proposes Aligner, a pre-trained visual encoder to align referential concepts with the queried images. During the dialogues, it extracts features of reference images with these corresponding concepts and recognizes them in the queried image, enabling personalization. We note that the computational cost and parameter count of the Aligner are negligible within the entire framework. With comprehensive qualitative and quantitative analyses, we reveal the effectiveness and superiority of PLVM.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
PublisherIEEE Computer Society
Pages3632-3641
Number of pages10
ISBN (Electronic)9798331599942
DOIs
StatePublished - 2025
Event2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025 - Nashville, United States
Duration: Jun 11 2025Jun 12 2025

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
Country/TerritoryUnited States
CityNashville
Period06/11/2506/12/25

Fingerprint

Dive into the research topics of 'PLVM: A Tuning-Free Approach for Personalized Large Vision-Language Model'. Together they form a unique fingerprint.

Cite this