Independence-aware Advantage Estimation

Pushi Zhang; Li Zhao; Guoqing Liu; Jiang Bian; Minlie Huang; Tao Qin; Tie-Yan Liu

doi:10.24963/ijcai.2021/461

Independence-aware Advantage Estimation

Pushi Zhang, Li Zhao, Guoqing Liu, Jiang Bian, Minlie Huang, Tao Qin, Tie-Yan Liu

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

Main Track. Pages 3349-3355. https://doi.org/10.24963/ijcai.2021/461

PDF BibTeX

Most of existing advantage function estimation methods in reinforcement learning suffer from the problem of high variance, which scales unfavorably with the time horizon. To address this challenge, we propose to identify the independence property between current action and future states in environments, which can be further leveraged to effectively reduce the variance of the advantage estimation. In particular, the recognized independence property can be naturally utilized to construct a novel importance sampling advantage estimator with close-to-zero variance even when the Monte-Carlo return signal yields a large variance. To further remove the risk of the high variance introduced by the new estimator, we combine it with existing Monte-Carlo estimator via a reward decomposition model learned by minimizing the estimation variance. Experiments demonstrate that our method achieves higher sample efficiency compared with existing advantage estimation methods in complex environments.

Keywords:

Machine Learning: Reinforcement Learning

Machine Learning: Deep Reinforcement Learning