Paper
7 December 2023 Use advantage-better action imitation for policy constraint
Weipeng Liu, Maolin Hou
Author Affiliations +
Proceedings Volume 12941, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2023); 129412F (2023) https://doi.org/10.1117/12.3011988
Event: Third International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 203), 2023, Yinchuan, China
Abstract
Offline Reinforcement Learning (RL) aims to learn an optimal policy from a fixed dataset previously collected. Unlike in the online training process, the errors in value estimation from out-of-distribution actions (OOD actions) could not be corrected by interacting with the environment, which makes offline RL difficult to train. Prior policy constraint methods mitigate the errors by minimizing the deviation from the behavior policy, which in fact makes a trade-off between RL and imitation learning. However, while using the constraint term to avoid choosing OOD actions, the learned policy may be enslaved to the unfavorable actions from the dataset. In this paper, we propose a simple solution to this problem. Our method uses an imitation learning term introduced by TD3PlusBc as the constraint and re-weights it with a function of actions' advantage value to mitigate the influence of the unfavorable actions. To make the learning process stable, we also decouple the policy evaluation and policy improvement by using implicit q-learning, which modifies the loss function in a SARSA-style TD backup. Our method (ABAI) is easy to implement, fast to train, and computationally efficient. ABAI achieves state-of-the-art performance on the D4RL dataset, a standard benchmark for offline reinforcement learning, and shows a high ability to learn robustly from various datasets.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Weipeng Liu and Maolin Hou "Use advantage-better action imitation for policy constraint", Proc. SPIE 12941, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2023), 129412F (7 December 2023); https://doi.org/10.1117/12.3011988
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Machine learning

Tunable filters

Error analysis

Online learning

Robots

Mixtures

Back to Top