Robust Reinforcement Learning via Progressive Task Sequence

Robust Reinforcement Learning via Progressive Task Sequence

Yike Li, Yunzhe Tian, Endong Tong, Wenjia Niu, Jiqiang Liu

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 455-463. https://doi.org/10.24963/ijcai.2023/51

Robust reinforcement learning (RL) has been a challenging problem due to the gap between simulation and the real world. Existing efforts typically address the robust RL problem by solving a max-min problem. The main idea is to maximize the cumulative reward under the worst-possible perturbations. However, the worst-case optimization either leads to overly conservative solutions or unstable training process, which further affects the policy robustness and generalization performance. In this paper, we tackle this problem from both formulation definition and algorithm design. First, we formulate the robust RL as a max-expectation optimization problem, where the goal is to find an optimal policy under both the worst cases and the non-worst cases. Then, we propose a novel framework DRRL to solve the max-expectation optimization. Given our definition of the feasible tasks, a task generation and sequencing mechanism is introduced to dynamically output tasks at appropriate difficulty level for the current policy. With these progressive tasks, DRRL realizes dynamic multi-task learning to improve the policy robustness and the training stability. Finally, extensive experiments demonstrate that the proposed method exhibits significant performance on the unmanned CarRacing game and multiple high-dimensional MuJoCo environments.
Keywords:
AI Ethics, Trust, Fairness: ETF: Safety and robustness
Agent-based and Multi-agent Systems: MAS: Agent theories and models