Three-Head Neural Network Architecture for Monte Carlo Tree Search
Three-Head Neural Network Architecture for Monte Carlo Tree Search
Chao Gao, Martin Müller, Ryan Hayward
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 3762-3768.
https://doi.org/10.24963/ijcai.2018/523
AlphaGo Zero pioneered the concept of two-head neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action
probability and the state-value estimate is used for leaf node evaluation.
We propose a three-head neural net architecture with policy, state- and action-value outputs, which could lead to more efficient MCTS since
neural leaf estimate can still be back-propagated in tree with delayed node expansion and evaluation. To effectively train
the newly introduced action-value head on the same game dataset as for two-head nets, we exploit the optimal relations between parent
and children nodes for data augmentation and regularization. In our experiments for the game of Hex, the action-value head learning
achieves similar error as the state-value prediction of a two-head architecture. The resulting neural net models are then combined with
the same Policy Value MCTS (PV-MCTS) implementation. We show that, due to more efficient use of neural net evaluations, PV-MCTS with
three-head neural nets consistently performs better than the two-head ones, significantly outplaying the state-of-the-art player MoHex-CNN.
Keywords:
Multidisciplinary Topics and Applications: Computer Games
Heuristic Search and Game Playing: Heuristic Search
Machine Learning: Deep Learning
Heuristic Search and Game Playing: Heuristic Search and Machine Learning