Multi-Constraint Deep Reinforcement Learning for Smooth Action Control

Multi-Constraint Deep Reinforcement Learning for Smooth Action Control

Guangyuan Zou, Ying He, F. Richard Yu, Longquan Chen, Weike Pan, Zhong Ming

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 3802-3808. https://doi.org/10.24963/ijcai.2022/528

Deep reinforcement learning (DRL) has been studied in a variety of challenging decision-making tasks, e.g., autonomous driving. \textcolor{black}{However, DRL typically suffers from the action shaking problem, which means that agents can select actions with big difference even though states only slightly differ.} One of the crucial reasons for this issue is the inappropriate design of the reward in DRL. In this paper, to address this issue, we propose a novel way to incorporate the smoothness of actions in the reward. Specifically, we introduce sub-rewards and add multiple constraints related to these sub-rewards. In addition, we propose a multi-constraint proximal policy optimization (MCPPO) method to solve the multi-constraint DRL problem. Extensive simulation results show that the proposed MCPPO method has better action smoothness compared with the traditional proportional-integral-differential (PID) and mainstream DRL algorithms. The video is available at https://youtu.be/F2jpaSm7YOg.
Keywords:
Machine Learning: Deep Reinforcement Learning
Robotics: Behavior and Control