Skip to main navigation Skip to search Skip to main content

Demystify Hyperparameters for Stochastic Optimization with Transferable Representations

  • University of Virginia

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

This paper studies the convergence and generalization of a large class of Stochastic Gradient Descent (SGD) momentum schemes, in both learning from scratch and transferring representations with fine-tuning. Momentum-based acceleration of SGD is the default optimizer for many deep learning models. However, there is a lack of general convergence guarantees for many existing momentum variants in conjunction withstochastic gradient. It is also unclear how the momentum methods may affect thegeneralization error. In this paper, we give a unified analysis of several popular optimizers, e.g., Polyak's heavy ball momentum and Nesterov's accelerated gradient. Our contribution is threefold. First, we give a unified convergence guarantee for a large class of momentum variants in thestochastic setting. Notably, our results cover both convex and nonconvex objectives. Second, we prove a generalization bound for neural networks trained by momentum variants. We analyze how hyperparameters affect the generalization bound and consequently propose guidelines on how to tune these hyperparameters in various momentum schemes to generalize well. We provide extensive empirical evidence to our proposed guidelines. Third, this study fills the vacancy of a formal analysis of fine-tuning in literature. To our best knowledge, our work is the first systematic generalizability analysis on momentum methods that cover both learning from scratch and fine-tuning. Our codes are available https://github.com/jsycsjh/Demystify-Hyperparameters-for-Stochastic-Optimization-with-Transferable-Representations .

Original languageEnglish
Title of host publicationKDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1706-1716
Number of pages11
ISBN (Electronic)9781450393850
DOIs
StatePublished - Aug 14 2022
Event28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 - Washington, United States
Duration: Aug 14 2022Aug 18 2022

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Country/TerritoryUnited States
CityWashington
Period08/14/2208/18/22

Keywords

  • automl
  • deep learning optimization
  • fine-tuning
  • generalization bound
  • stochastic gradient descent

Fingerprint

Dive into the research topics of 'Demystify Hyperparameters for Stochastic Optimization with Transferable Representations'. Together they form a unique fingerprint.

Cite this