Skip to main navigation Skip to search Skip to main content

Provably Faster Algorithms for Bilevel Optimization

  • Ohio State University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

79 Scopus citations

Abstract

Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than Oe-2) of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of Oe-1.5), which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
EditorsMarc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
PublisherNeural information processing systems foundation
Pages13670-13682
Number of pages13
ISBN (Electronic)9781713845393
StatePublished - 2021
Event35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online
Duration: Dec 6 2021Dec 14 2021

Publication series

NameAdvances in Neural Information Processing Systems
Volume17
ISSN (Print)1049-5258

Conference

Conference35th Conference on Neural Information Processing Systems, NeurIPS 2021
CityVirtual, Online
Period12/6/2112/14/21

Fingerprint

Dive into the research topics of 'Provably Faster Algorithms for Bilevel Optimization'. Together they form a unique fingerprint.

Cite this