TY - GEN
T1 - On the source-to-target gap of robust double deep Q-learning in digital twin-enabled wireless networks
AU - Maxwell, McManus
AU - Zhangyu, Guan
AU - Nicholas, Mastronarde
AU - Shaofeng, Zou
N1 - Publisher Copyright:
© 2022 SPIE. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Digital twin has been envisioned as a key tool to enable data-driven real-time monitoring and prediction, automated modeling as well as zero-touch control and optimization in next-generation wireless networks. However, because of the mismatch between the dynamics in the source domain (i.e., the digital twin) and the target domain (i.e., the real network), policies generated in source domain by traditional machine learning algorithms may suffer from significant performance degradation when applied in the target domain, i.e., the so-called "source-to-target (S2T) gap" problem. In this work we investigate experimentally the S2T gap in digital twin-enabled wireless networks considering a new class of reinforcement learning algorithms referred to as robust deep reinforcement learning. We first design, based on a combination of double deep Q-learning and an R-contamination model, a robust learning framework to control the policy robustness through adversarial dynamics expected in the target domain. Then we test the robustness of the learning framework over UBSim, an event-driven universal simulator for broadband mobile wireless networks. The source domain is first constructed over UBSim by creating a virtual representation of an indoor testing environment at University at Buffalo, and then the target domain is constructed by modifying the source domain in terms of blockage distribution, user locations, among others. We compare the robust learning algorithm with traditional reinforcement learning algorithms in the presence of controlled model mismatch between the source and target domains. Through experiments we demonstrate that, with proper selection of parameter R, robust learning algorithms can reduce significantly the S2T gap, while they can be either too conservative or explorative otherwise. We observe that robust policy transfer is effective especially for target domains with time-varying blockage dynamics.
AB - Digital twin has been envisioned as a key tool to enable data-driven real-time monitoring and prediction, automated modeling as well as zero-touch control and optimization in next-generation wireless networks. However, because of the mismatch between the dynamics in the source domain (i.e., the digital twin) and the target domain (i.e., the real network), policies generated in source domain by traditional machine learning algorithms may suffer from significant performance degradation when applied in the target domain, i.e., the so-called "source-to-target (S2T) gap" problem. In this work we investigate experimentally the S2T gap in digital twin-enabled wireless networks considering a new class of reinforcement learning algorithms referred to as robust deep reinforcement learning. We first design, based on a combination of double deep Q-learning and an R-contamination model, a robust learning framework to control the policy robustness through adversarial dynamics expected in the target domain. Then we test the robustness of the learning framework over UBSim, an event-driven universal simulator for broadband mobile wireless networks. The source domain is first constructed over UBSim by creating a virtual representation of an indoor testing environment at University at Buffalo, and then the target domain is constructed by modifying the source domain in terms of blockage distribution, user locations, among others. We compare the robust learning algorithm with traditional reinforcement learning algorithms in the presence of controlled model mismatch between the source and target domains. Through experiments we demonstrate that, with proper selection of parameter R, robust learning algorithms can reduce significantly the S2T gap, while they can be either too conservative or explorative otherwise. We observe that robust policy transfer is effective especially for target domains with time-varying blockage dynamics.
KW - Digital Twin
KW - Domain Adaptation
KW - Reinforcement Learning
KW - Source-to-Target Gap
KW - Zero-touch Networks
UR - https://www.scopus.com/pages/publications/85133586272
U2 - 10.1117/12.2618612
DO - 10.1117/12.2618612
M3 - Conference contribution
AN - SCOPUS:85133586272
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Big Data IV
A2 - Ahmad, Fauzia
A2 - Markopoulos, Panos P.
A2 - Ouyang, Bing
PB - SPIE
T2 - Big Data IV: Learning, Analytics, and Applications 2022
Y2 - 6 June 2022 through 12 June 2022
ER -