Skip to main navigation Skip to search Skip to main content

ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models

  • Jianyi Zhang
  • , Aashiq Muhamed
  • , Aditya Anantharaman
  • , Guoyin Wang
  • , Changyou Chen
  • , Kai Zhong
  • , Qingjun Cui
  • , Yi Xu
  • , Belinda Zeng
  • , Trishul Chilimbi
  • , Yiran Chen
  • Duke University
  • Amazon.com, Inc.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

Knowledge Distillation (KD) (Hinton et al., 2015) is one of the most effective approaches for deploying large-scale pre-trained language models in low-latency environments by transferring the knowledge contained in the large-scale models to smaller student models. Previous KD approaches use the soft labels and intermediate activations generated by the teacher to transfer knowledge to the student model parameters alone. In this paper, we show that having access to non-parametric memory in the form of a knowledge base with the teacher’s soft labels and predictions can further enhance student capacity and improve generalization. To enable the student to retrieve from the knowledge base effectively, we propose a new Retrieval-augmented KD framework with a loss function that aligns the relational knowledge in teacher and student embedding spaces. We show through extensive experiments that our retrieval mechanism can achieve state-of-the-art performance for task-specific knowledge distillation on the GLUE benchmark (Wang et al., 2018a).

Original languageEnglish
Title of host publicationShort Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages1128-1136
Number of pages9
ISBN (Electronic)9781959429715
DOIs
StatePublished - 2023
Event61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: Jul 9 2023Jul 14 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume2
ISSN (Print)0736-587X

Conference

Conference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period07/9/2307/14/23

Fingerprint

Dive into the research topics of 'ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models'. Together they form a unique fingerprint.

Cite this