Skip to main navigation Skip to search Skip to main content

Designing Near-Optimal Partially Observable Reinforcement Learning

  • Ohio State University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Partially observable Markov decision processes (POMDPs) have been widely applied in various real-world applications. However, existing results have shown that learning in POMDPs is intractable in the worst case. The main challenge lies in the lack of latent state information. For example, in wireless channel scheduling, due to energy and security constraints, it is usually difficult or impossible for the user to know the conditions/states of all channels. Thus, a key fundamental question here is: how much online state information (OSI) is sufficient to achieve tractability? In this paper, we make the first effort to establish fundamental conditions and methods for bridging the gap between partially observable reinforcement learning and networking with incomplete state information. Specifically, we establish a lower bound that reveals a surprising hardness result: unless we have full OSI, we need an exponentially scaling sample complexity to obtain an ϵ-optimal policy solution for POMDPs. Nonetheless, motivated by the structures of practical systems, we identify important subclasses of POMDPs that are tractable, even with only partial OSI. For two subclasses of POMDPs with partial OSI, we provide new algorithms that are proved to be near-optimal by establishing new regret upper and lower bounds.

Original languageEnglish
Title of host publication2024 IEEE Military Communications Conference, MILCOM 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages463-468
Number of pages6
ISBN (Electronic)9798350374230
DOIs
StatePublished - 2024
Event2024 IEEE Military Communications Conference, MILCOM 2024 - Washington, United States
Duration: Oct 28 2024Nov 1 2024

Publication series

NameProceedings - IEEE Military Communications Conference MILCOM
ISSN (Print)2155-7578
ISSN (Electronic)2155-7586

Conference

Conference2024 IEEE Military Communications Conference, MILCOM 2024
Country/TerritoryUnited States
CityWashington
Period10/28/2411/1/24

Keywords

  • partial observability
  • regret analysis
  • reinforcement learning
  • sample complexity
  • wireless channel scheduling

Fingerprint

Dive into the research topics of 'Designing Near-Optimal Partially Observable Reinforcement Learning'. Together they form a unique fingerprint.

Cite this