Abstract
The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of O˜(1/ϵ) in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of O(1/ϵ).
| Original language | English |
|---|---|
| Article number | 107107 |
| Journal | Operations Research Letters |
| Volume | 54 |
| DOIs | |
| State | Published - May 2024 |
Keywords
- Accelerated gradient method
- Constrained Markov decision process
- Entropy regularization
- Policy optimization
- Primal-dual algorithm
Fingerprint
Dive into the research topics of 'Faster algorithm and sharper analysis for constrained Markov decision process'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver