We present a method for applying deep reinforcement learning to maritime platform defense, showing how to successfully train agents to schedule countermeasures for defending a fleet of ships against stochastic raids in a simulated environment. Our Schedule Evaluation Simulation (SEvSim) environment was developed using extensive input from subject matter experts and contains realistic threat characteristics, weapon efficacies, and constraints among weapons. Our approach includes novelty in both the representation of the system state and the neural network architecture: threats are represented as vectors containing information on the projected effect of different scheduling actions on their viability and fed to network input “slots” in randomized locations. Agents are trained using Proximal Policy Optimization, a state-of-the-art method for model-free learning. We evaluate the performance of our approach, finding that it learns scheduling strategies that both reliably neutralize threats and conserve inventory. We subsequently discuss the remaining challenges involved in bringing neural-network-based control to realization in this application space. Among these challenges are the needs to integrate humans into the loop, provide safety assurances, and enable continual learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.