Weighted Double Q-learning

Weighted Double Q-learning

Zongzhang Zhang, Zhiyuan Pan, Mykel J. Kochenderfer

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 3455-3461. https://doi.org/10.24963/ijcai.2017/483

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Q-learning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.
Keywords:
Machine Learning: Reinforcement Learning
Uncertainty in AI: Markov Decision Processes