Skip to main navigation Skip to search Skip to main content

Repulsive attention: Rethinking multi-head attention as Bayesian inference

  • Bang An
  • , Jie Lyu
  • , Zhenyi Wang
  • , Chunyuan Li
  • , Changwei Hu
  • , Fei Tan
  • , Ruiyi Zhang
  • , Yifan Hu
  • , Changyou Chen
  • SUNY Buffalo
  • Microsoft USA
  • Yahoo Research Labs
  • Duke University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

The neural attention mechanism plays an important role in many natural language processing applications. In particular, multi-head attention extends single-head attention by allowing a model to jointly attend information from different perspectives. However, without explicit constraining, multi-head attention may suffer from attention collapse, an issue that makes different heads extract similar attentive features, thus limiting the model's representation power. In this paper, for the first time, we provide a novel understanding of multi-head attention from a Bayesian perspective. Based on the recently developed particle-optimization sampling techniques, we propose a non-parametric approach that explicitly improves the repulsiveness in multi-head attention and consequently strengthens model's expressiveness. Remarkably, our Bayesian interpretation provides theoretical inspirations on the not-well-understood questions: why and how one uses multi-head attention. Extensive experiments on various attention models and applications demonstrate that the proposed repulsive attention can improve the learned feature diversity, leading to more informative representations with consistent performance improvement on multiple tasks.

Original languageEnglish
Title of host publicationEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages236-255
Number of pages20
ISBN (Electronic)9781952148606
StatePublished - 2020
Event2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 - Virtual, Online
Duration: Nov 16 2020Nov 20 2020

Publication series

NameEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
CityVirtual, Online
Period11/16/2011/20/20

Fingerprint

Dive into the research topics of 'Repulsive attention: Rethinking multi-head attention as Bayesian inference'. Together they form a unique fingerprint.

Cite this