Skip to main navigation Skip to search Skip to main content

Policy Optimization for Robust Average Cost MDPs

  • SUNY Buffalo
  • University of Texas at Arlington
  • University of Connecticut

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

This paper studies first-order policy optimization for robust average cost Markov decision processes (MDPs). Specifically, we focus on ergodic Markov chains. For robust average cost MDPs, the goal is to optimize the worst-case average cost over an uncertainty set of transition kernels. We first develop a sub-gradient of the robust average cost. Based on the sub-gradient, a robust policy mirror descent approach is further proposed. To characterize its iteration complexity, we develop a lower bound on the difference of robust average cost between two policies and further show that the robust average cost satisfies the Polyak-Łojasiewicz (PL)-condition. We then show that with increasing step size, our robust policy mirror descent achieves a linear convergence rate in the optimality gap, and with constant step size, our algorithm converges to an ϵ-optimal policy with an iteration complexity of O(1/ϵ). The convergence rate of our algorithm matches with the best convergence rate of policy-based algorithms for robust MDPs. Moreover, our algorithm is the first algorithm that converges to the global optimum with general uncertainty sets for robust average cost MDPs. We provide simulation results to demonstrate the performance of our algorithm.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: Dec 9 2024Dec 15 2024

Fingerprint

Dive into the research topics of 'Policy Optimization for Robust Average Cost MDPs'. Together they form a unique fingerprint.

Cite this