Skip to main navigation Skip to search Skip to main content

Adversarial Online Reinforcement Learning Under Limited Defender Resources

  • Ohio State University

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Reinforcement learning (RL) is a very powerful tool for sequential decision making. It has already been a vital component in solving grand challenge problems like the “protein folding problem.” The RL model is well suited for communication networks because it can learn complex behaviors in the target environment even when information like channel conditions, user statistics are unavailable. Our focus here is on adversarial RL because of its ability to capture scenarios where the reward and/or the dynamics of the environment change over time, possibly in an adversarial manner. In order to perform well in such systems, the agent, e.g., defender, needs to change the policy accordingly over time. However, the agent may not be able to afford frequent policy changes especially when the reward and/or the dynamics of the environment change rapidly, e.g., the energy limitations for edge devices to change policies. Thus, in addition to the standard metric of losses, switching costs, which capture the costs for changing policies, are regarded as a critical metric in RL. This chapter will introduce the state-of-the-art results on both bandits and RL with switching costs, their importance on network security under limited defender resources, and interesting future work.

Original languageEnglish
Title of host publicationAdvances in Information Security
PublisherSpringer
Pages265-301
Number of pages37
DOIs
StatePublished - 2024

Publication series

NameAdvances in Information Security
Volume107
ISSN (Print)1568-2633
ISSN (Electronic)2512-2193

Fingerprint

Dive into the research topics of 'Adversarial Online Reinforcement Learning Under Limited Defender Resources'. Together they form a unique fingerprint.

Cite this