Skip to main navigation Skip to search Skip to main content

Distributionally Robust Memory Evolution With Generalized Divergence for Continual Learning

  • Zhenyi Wang
  • , Li Shen
  • , Tiehang Duan
  • , Qiuling Suo
  • , Le Fang
  • , Wei Liu
  • , Mingchen Gao
  • University of Maryland, College Park
  • JD Explore Academy
  • Meta Inc
  • SUNY Buffalo
  • Tencent

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Continual learning (CL) aims to learn a non-stationary data distribution and not forget previous knowledge. The effectiveness of existing approaches that rely on memory replay can decrease over time as the model tends to overfit the stored examples. As a result, the model's ability to generalize well is significantly constrained. Additionally, these methods often overlook the inherent uncertainty in the memory data distribution, which differs significantly from the distribution of all previous data examples. To overcome these issues, we propose a principled memory evolution framework that dynamically adjusts the memory data distribution. This evolution is achieved by employing distributionally robust optimization (DRO) to make the memory buffer increasingly difficult to memorize. We consider two types of constraints in DRO: f-divergence and Wasserstein ball constraints. For f-divergence constraint, we derive a family of methods to evolve the memory buffer data in the continuous probability measure space with Wasserstein gradient flow (WGF). For Wasserstein ball constraint, we directly solve it in the euclidean space. Extensive experiments on existing benchmarks demonstrate the effectiveness of the proposed methods for alleviating forgetting. As a by-product of the proposed framework, our method is more robust to adversarial examples than compared CL methods.

Original languageEnglish
Pages (from-to)14337-14352
Number of pages16
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume45
Issue number12
DOIs
StatePublished - Dec 1 2023

Keywords

  • Continual learning
  • Wasserstein gradient flow
  • distributionally robust optimization
  • f-divergence

Fingerprint

Dive into the research topics of 'Distributionally Robust Memory Evolution With Generalized Divergence for Continual Learning'. Together they form a unique fingerprint.

Cite this