TY - GEN
T1 - Learning Robust Policies for Generalized Debris Capture with an Automated Tether-Net System
AU - Zeng, Chen
AU - Krisshnakumar, Prajit
AU - Chowdhury, Souma
AU - Hecht, Grant
AU - Shah, Raj Kalpeshkumar
AU - Botta, Eleonora M.
N1 - Publisher Copyright:
© 2022, American Institute of Aeronautics and Astronautics Inc.. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Tether-net launched from a chaser spacecraft provides a promising method to capture and dispose off large space debris in orbit. This tether-net system is subject to several sources of uncertainty in sensing and actuation that affect the performance of its net launch and closing control. Earlier reliability based optimization approaches to design control actions however remain challenging and computationally prohibitive to generalize over varying launch scenarios and target (debris) state relative to chaser. To search for a general and reliable control policy, this paper presents a reinforcement learning framework that integrates a proximal policy optimization (PPO2) approach with net dynamics simulations. The latter allows evaluating the episodes of net-based target capture, and estimate the capture quality index that serves as the reward feedback to PPO2. Here, the learnt policy is designed to model the timing of the net closing action based on the state of the moving net and the target, under any given launch scenario. A stochastic state transition model is considered in order to incorporate synthetic uncertainties in state estimation and launch actuation. Along with notable reward improvement during training, the trained policy demonstrates capture performance (over a wide range of launch/target scenarios) that is close to that obtained with reliability based optimization run over an individual scenario.
AB - Tether-net launched from a chaser spacecraft provides a promising method to capture and dispose off large space debris in orbit. This tether-net system is subject to several sources of uncertainty in sensing and actuation that affect the performance of its net launch and closing control. Earlier reliability based optimization approaches to design control actions however remain challenging and computationally prohibitive to generalize over varying launch scenarios and target (debris) state relative to chaser. To search for a general and reliable control policy, this paper presents a reinforcement learning framework that integrates a proximal policy optimization (PPO2) approach with net dynamics simulations. The latter allows evaluating the episodes of net-based target capture, and estimate the capture quality index that serves as the reward feedback to PPO2. Here, the learnt policy is designed to model the timing of the net closing action based on the state of the moving net and the target, under any given launch scenario. A stochastic state transition model is considered in order to incorporate synthetic uncertainties in state estimation and launch actuation. Along with notable reward improvement during training, the trained policy demonstrates capture performance (over a wide range of launch/target scenarios) that is close to that obtained with reliability based optimization run over an individual scenario.
KW - Active Debris Removal
KW - Reinforcement Learning
KW - Tether-Net
KW - Uncertainty
UR - https://www.scopus.com/pages/publications/85123888184
U2 - 10.2514/6.2022-2379
DO - 10.2514/6.2022-2379
M3 - Conference contribution
AN - SCOPUS:85123888184
SN - 9781624106316
T3 - AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022
BT - AIAA SciTech Forum 2022
PB - American Institute of Aeronautics and Astronautics Inc, AIAA
T2 - AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022
Y2 - 3 January 2022 through 7 January 2022
ER -