Skip to main navigation Skip to search Skip to main content

REVISITING LARGE-SCALE NON-CONVEX DISTRIBUTIONALLY ROBUST OPTIMIZATION

  • Qi Zhang
  • , Yi Zhou
  • , Simon Khan
  • , Ashley Prater-Bennette
  • , Lixin Shen
  • , Shaofeng Zou
  • Arizona State University
  • Texas A&M University
  • Air Force Research Laboratory
  • Syracuse University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Distributionally robust optimization (DRO) is a powerful technique to train robust machine learning models that perform well under distribution shifts. Compared with empirical risk minimization (ERM), DRO optimizes the expected loss under the worst-case distribution in an uncertainty set of distributions. This paper revisits the important problem of DRO with non-convex smooth loss functions. For this problem, Jin et al. (2021) showed that its dual problem is generalized (L0, L1)smooth condition and gradient noise satisfies the affine variance condition, designed an algorithm of mini-batch normalized gradient descent with momentum, and proved its convergence and complexity. In this paper, we show that the dual problem and the gradient noise satisfy simpler yet more precise partially generalized smoothness condition and partially affine variance condition by studying the optimization variable and dual variable separately, which further yields much simpler convergence analysis. We develop a double stochastic gradient descent with clipping (D-SGD-C) algorithm that converges to an ϵ-stationary point with O(ϵ−4) gradient complexity, which matches with results in Jin et al. (2021). Our proof is much simpler, thanks to the more precise characterization of partially generalized smoothness and partially affine variance noise. We further design a variance-reduced method that achieves a lower gradient complexity of O(ϵ−3). Our theoretical results and insights are further verified numerically on a number of tasks, and our algorithms outperform the existing DRO method (Jin et al., 2021).

Original languageEnglish
Title of host publication13th International Conference on Learning Representations, ICLR 2025
PublisherInternational Conference on Learning Representations, ICLR
Pages50363-50389
Number of pages27
ISBN (Electronic)9798331320850
StatePublished - 2025
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore
Duration: Apr 24 2025Apr 28 2025

Publication series

Name13th International Conference on Learning Representations, ICLR 2025

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
CitySingapore
Period04/24/2504/28/25

Fingerprint

Dive into the research topics of 'REVISITING LARGE-SCALE NON-CONVEX DISTRIBUTIONALLY ROBUST OPTIMIZATION'. Together they form a unique fingerprint.

Cite this