TY - CHAP
T1 - Adversarial Online Reinforcement Learning Under Limited Defender Resources
AU - Shi, Ming
AU - Liang, Yingbin
AU - Shroff, Ness B.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Reinforcement learning (RL) is a very powerful tool for sequential decision making. It has already been a vital component in solving grand challenge problems like the “protein folding problem.” The RL model is well suited for communication networks because it can learn complex behaviors in the target environment even when information like channel conditions, user statistics are unavailable. Our focus here is on adversarial RL because of its ability to capture scenarios where the reward and/or the dynamics of the environment change over time, possibly in an adversarial manner. In order to perform well in such systems, the agent, e.g., defender, needs to change the policy accordingly over time. However, the agent may not be able to afford frequent policy changes especially when the reward and/or the dynamics of the environment change rapidly, e.g., the energy limitations for edge devices to change policies. Thus, in addition to the standard metric of losses, switching costs, which capture the costs for changing policies, are regarded as a critical metric in RL. This chapter will introduce the state-of-the-art results on both bandits and RL with switching costs, their importance on network security under limited defender resources, and interesting future work.
AB - Reinforcement learning (RL) is a very powerful tool for sequential decision making. It has already been a vital component in solving grand challenge problems like the “protein folding problem.” The RL model is well suited for communication networks because it can learn complex behaviors in the target environment even when information like channel conditions, user statistics are unavailable. Our focus here is on adversarial RL because of its ability to capture scenarios where the reward and/or the dynamics of the environment change over time, possibly in an adversarial manner. In order to perform well in such systems, the agent, e.g., defender, needs to change the policy accordingly over time. However, the agent may not be able to afford frequent policy changes especially when the reward and/or the dynamics of the environment change rapidly, e.g., the energy limitations for edge devices to change policies. Thus, in addition to the standard metric of losses, switching costs, which capture the costs for changing policies, are regarded as a critical metric in RL. This chapter will introduce the state-of-the-art results on both bandits and RL with switching costs, their importance on network security under limited defender resources, and interesting future work.
UR - https://www.scopus.com/pages/publications/85200479687
U2 - 10.1007/978-3-031-53510-9_10
DO - 10.1007/978-3-031-53510-9_10
M3 - Chapter
AN - SCOPUS:85200479687
T3 - Advances in Information Security
SP - 265
EP - 301
BT - Advances in Information Security
PB - Springer
ER -