Why Minimax Works: An Alternative Explanation Mitja Lustrek and Matjaz Gams Ivan Bratko Jozef Stefan Institute University of Ljubljana Department of Intelligent Systems Faculty of Computer and Information Science Jamova 39, 1000 Ljubljana, Slovenia Trzaska 25, 1000 Ljubljana, Slovenia mitja.lustrek@ijs.si, matjaz.gams@ijs.si bratko@fri.uni-lj.si Abstract Gams, 1982; Beal, 1982; Nau, 1982; Scheucher and Kaindl, 1998; Lustrek, 2004]. In game-playing programs relying on the minimax In most research on the minimax pathology, true position principle, deeper searches generally produce better values were losses or wins. This seems reasonable, since in evaluations. Theoretical analyses, however, suggest games like chess, positions can indeed only be lost or won that in many cases minimaxing amplifies the noise (or drawn). In practice, however, a game playing program introduced by the heuristic function used to needs an evaluation function that makes it possible to evaluate the leaves of the game tree, leading to maintain a direction of play, gradually moving toward a what is known as pathological behavior, where win, not just maintaining a won position without achieving deeper searches produce worse evaluations. In the final goal. This requires a multivalued or even real- most of the previous research, positions were valued evaluation function. evaluated as losses or wins. Dependence between We introduce a minimax model using real-valued the values of positions close to each other was evaluation function and a way to interpret real values as identified as the property of realistic game trees losses and wins. This leads to a new, simpler explanation of that eliminates the pathology and explains why the pathology. Namely, node values at lower levels of the minimax is successful in practice. In this paper we game tree are more dispersed. If the error of the evaluation present an alternative explanation that does not rely function, represented by normally distributed noise, is on value dependence. We show that if real numbers independent of the depth of search, it turns out that the error are used for position values, position values tend to in terms of loss/win evaluations decreases with the depth of be further apart at lower levels of the game tree, search, because larger dispersion means a larger proportion which leads to a larger proportion of more extreme of more extreme positions where error is less likely. This is positions, where error is less probable. Decreased different from what the previous analyses assumed and is probability of error in searches to greater depths is sufficient to eliminate the pathology. sufficient to eliminate the pathology and no The paper is organized as follows. Section 2 presents the additional properties of game trees are required. minimax pathology and gives an overview of the attempts to explain it. Section 3 introduces a minimax model based on 1 Introduction real-number position values. Section 4 shows why the model from section 3 is not pathological, while seemingly Most game-playing programs are based on the minimax similar models used in previous research were. Section 5 principle. Such programs choose the best move by searching explains whether minimax in general can be expected to the game tree to a chosen depth, heuristically evaluating the behave non-pathologically. Section 6 concludes the paper leaves and then propagating their values to the root using and points out where further research is needed. the minimax principle. It is generally agreed that deeper searches produce better evaluations at the root. However, 2 The Pathology and Related Work first attempts to explain this mathematically yielded the paradoxical result that minimaxing amplifies the error of the The minimax pathology was discovered independently by heuristic evaluations and that consequently deeper searches Nau [1979] and Beal [1980]. Beal's basic minimax model produce worse evaluations [Nau, 1979; Beal, 1980]. This made several assumptions: phenomenon is called the minimax pathology [Nau, 1979]. 1. game tree has a uniform branching factor; It was evident that the setting of these mathematical 2. nodes of the tree can have two values: loss and win; analyses omitted some property of real games that 3. node values are distributed so that at each level of the tree eliminates the pathology. Several explanation were the proportion of losses for the side to move is the same; proposed, but eventually most researchers came to the 4. node values within each level of the tree are independent conclusion that the property they were looking for is the of each other; similarity of positions close to each other [Bratko and only indirectly and it is not known whether it applies to 5. the error of heuristic evaluation at a node at the lowest cases other than the endgame they studied. level of search, being the probability of mistaking a loss 3. Having node values distributed so that ki = cb for all i not for a win or vice versa, is independent of the depth of search and the true value of the node. only simplifies calculations, it is generally agreed to be We proceed to present Beal's basic model, although our necessary [Bratko and Gams, 1982; Beal, 1982; Nau, analysis is mostly based on later work. Negamax 1982]. If k0 cb is chosen, ki starts to oscillate between representation is used in this section, i.e. node values are values close to 0 and 1 for relatively small i, meaning that viewed as lost or won from the perspective of the side to we are dealing with games that are almost certainly won move. Let b be the branching factor of the tree, d the depth for one side and as such not interesting. of search and ki the probability of a node at i-th level being 4. Most researchers [Bratko and Gams, 1982; Beal, 1982; Nau, 1982; Scheucher and Kaindl, 1998; Lustrek, 2004] lost. Levels are numbered downwards: from 0 for root to d agreed that similarity of positions close to each other for the lowest level of search. eliminates the pathology, although they arrived at this A node can only be lost if all of its descendants are won conclusion in different ways. Pearl [1983] claimed that (for the opponent), so the relation between the values of k at early terminations are the culprit, but these can also be consecutive levels is governed by equation (1). interpreted as a form of node-value dependence. ki = (1 ­ ki+1)b (1) 5. Pearl [1983] showed that in order to overcome the Assumption 3 requires ki = ki+1, which results in ki = cb for pathology, the error of the evaluation function must all i; for example, c2 = 0.3820. decrease exponentially with the depth of search. It is Two types of evaluation error are possible: a loss can be generally believed that the quality of the evaluation mistaken for a win (false win) or a win for a loss (false cannot vary enough to account for the absence of loss). Let pi and qi be the probabilities of the respective pathology. Scheucher and Kaindl [1998] did use depth- types of error at i-th level. They are calculated according to dependent error. And although such error made increased equations (2) and (3). depth of search more beneficial, node-value dependence 1 was required to altogether eliminate the pathology. pi = (1 - k i +1 ) b (1 - (1 - q i +1 ) b ) = 1 - (1 - q i +1 ) b (2) The conclusion one can make based on the existing ki literature on the minimax pathology is that the pathology is b b 1 j k (1 - k i +1 ) b- j pij+1 (1 - qi +1 ) b - j qi = usually not observed in real games because their position j (3) i +1 1- ki values are not independent of each other. This conclusion is j =1 reinforced by the fact that multiple authors have arrived at it It turns out that if the same pd = qd are used in searches to in different ways. all depths, the error in the root, defined as p0 k0 + q0 (1 ­ k0), increases with d. "This result is disappointing," concluded 3 A Minimax Model Based on Real Values Beal, since it was exactly the opposite of what he set out to show. His model must have been flawed in some way. Even though all the explanations for the absence of In the years following the discovery of the pathology, pathology in minimax provided in the previous section are several researchers attempted to find the flaw in Beal's basic valid, at least under the assumptions their authors made, are model by attacking its assumptions (1. through 5. at the these assumptions really necessary and realistic? We argue beginning of this section). that real numbers should be used for position values, in 1. Michon [1983] observed that the pathology depends on which case another, more basic explanation is sufficient. the probability distribution of branching factor: game Both game-playing programs and humans use trees with uniform branching factor tend to be multivalued position evaluations. There is little doubt this is pathological, while game trees with, for example, necessary in games where the final outcome is multivalued geometrically distributed branching factor do not. It is not (Othello, tarok etc.). In games where the outcome can only known whether real games have any of the non- be a loss, a win and perhaps a draw (chess, checkers etc.), pathological distributions, though. multiple values might seem to be useful only as a way to 2. Bratko and Gams [1982] were the first to experiment with express the uncertainty of a program or human. However, multivalued evaluations. They assigned no particular even given unlimited resources to determine the value of a meaning to different values, which resulted in behavior position, in a losing position, the best one can do against a similar to the one with two values, i.e. pathological. fallible and not fully known opponent is evaluate the Pearl's results were similar [1983]. Scheucher and Kaindl position in terms of the probability of loss. In a winning [1998] and Lustrek [2004] also used multiple and even position, even a perfect two-valued evaluation function real values, but only to establish realistic relations among could maintain a won position indefinitely without actually node values and the magnitude of error, so even though winning (or until termination due to 50-move rule in chess). their models were not pathological, they did not attribute In essence, multivalued evaluation function is necessary to the absence of pathology to real values. Sadikov et al. differentiate between multiple winning (or losing, if only [2003] used multiple values in their analysis of king and such are available) moves. Scheucher and Kaindl [1998] rook versus king chess endgame. They explained the demonstrated on chess that a two-valued evaluation function pathology, but their explanation involves multiple values performs poorly compared to a multivalued one. We propose a minimax model similar to Beal's basic model, except that it uses real numbers for position values: 1. game tree has a uniform branching factor; 2. nodes of the tree have real values; 3. if the real node values are converted to losses and wins, they are distributed so that at each level of the tree the proportion of losses for the side to move is the same; 4. node values within each level of the tree are independent of each other; 5. the error of heuristic evaluation at a node at the lowest level of search, being normally distributed noise, is independent of the depth of search and the true value of the node. Game trees built according to our model are assigned Figure 1: Equivalence of real- and two-value minimaxing. independent uniformly distributed values from [0, 1] interval to the leaves at level dmax. These are true values; We conducted Monte Carlo experiments with game trees true values of internal nodes are obtained by backing up the generated according to our model. Only the results for b = 2 true leaf values using the minimax rule. When searching to and dmax = 10 are presented in this paper; the results for depth d, heuristic values at level d are generated by larger branching factors and depths are similar. The results corrupting the true values with normally distributed noise are averaged over 10,000 game trees. For each tree, there representing the error of the heuristic evaluation function; were 10 repetitions with randomly generated noisy values heuristic values of nodes at levels < d are obtained by for each d. Figure 1 shows position, move and two-value backing up the corrupted values at level d using the error at the root of the game tree with respect to the depth of minimax rule. search; standard deviation of noise is 0.1. Two types of error can be observed at the root of a game 0.4 tree: position error, which is the absolute difference between the true and the heuristic value of the root, and 0.35 move error, which is the probability of choosing a wrong 0.3 move because of position error at the root's descendants. 0.25 Two-value However, neither type of error corresponds directly to the Error error in two-value models used in most of the previous Move 0.2 research: two-value error is defined as the probability of Position 0.15 mistaking a loss at the root for a win or vice versa. 0.1 In order to measure two-value error, real values must be converted to losses and wins. This can be accomplished by 0.05 establishing a threshold t: the values below it are considered 0 losses and the values above it wins. According to Beal's 0 1 2 3 4 5 6 7 8 9 10 assumption 3, if negamax representation is used, ki = cb for d all i. We do not use negamax representation in our model, so Figure 2: Error at the root with respect to the depth of search. ki alternates between cb and 1 ­ cb. Since true values of the leaves are distributed uniformly in [0, 1] interval, kd = cb is As can be seen in Figure 2, all three types of error achieved by setting t = cb. Even though real-value decrease with the depth of search (with the exception of minimaxing is used, ki behaves as desired for i > 0. This even/odd level fluctuations of two-value error). Note that happens for two reasons. First, leaf values in our real-value these observed behaviors are different from Beal's original model, converted to two values, correspond exactly to leaf results [1980], which were pathological. values in Beal's basic model. The probability of a loss at a Even though our primary concern is comparison with leaf with value X is P (X < t), which for uniform distribution Beal's basic model, we checked whether the absence of in [0, 1] interval and t = cb equals cb. In Beal's basic model, pathology occurs only under the described settings or is it the probability of a loss at each leaf is kd, which also equals more general. We tried uniform distribution of error and cb. Second, real- and two-value minimaxing are equivalent normal distribution of node values as well as different forms in the sense that performing minimax on losses and wins of dependence among node values. None of the experiments from level i to j < i gives the same results at level j as yielded pathology except for some rare cases where slightly performing minimax on the underlying real values from pathological behavior was caused by very large static error. level i to j and converting them to losses and wins at level j. There was no pathology in terms of position error, but move This is illustrated in Figure 1; losses are marked with "­" and two-value error did in some cases behave pathologically and wins with "+". when their static values were close to 0.5. However, since 0.5 is the point where evaluations become completely random, this seems to be of little practical importance. Figure 5 shows two-value error at the root when two- 4 Why is Our Model Not Pathological? value error at the lowest level is always 0.1; results for Considering that our model is very similar to Beal's, why Beal's basic model are shown for comparison. is it not pathological? To answer this question, we must 0.45 examine two-value error at the lowest level of search. Beal's 0.4 assumption 5 states that it should be constant with search 0.35 Two-value error depth, but in our model, real-value position error is set to be 0.3 constant instead (which is achieved by using normally 0.25 distributed noise with the same standard deviation at all 0.2 0.15 levels). Two-value error at the lowest level of search is 0.1 shown in Figure 3; standard deviation of noise is 0.1. 0.05 0 0.4 0 1 2 3 4 5 6 7 8 9 10 0.35 Two-value error d 0.3 0.25 Beal's basic model Bi nari zed real-value model 0.2 0.15 Figure 5: Two-value error at the root with respect to the depth of 0.1 search when two-value error at the lowest level of search 0.05 is always 0.1. 0 0 1 2 3 4 5 6 7 8 9 10 As can be seen in Figure 5, the results for binarized real- d value model and the results for Beal's basic model match quite well. The matching is not perfect because in the real- Figure 3: Two-value error at the lowest level of search with value model, the probability of a false win at the lowest respect to the depth of search. level of search is higher than the probability of a false loss. As can be seen in Figure 3, two-value error at the lowest This happens because false wins occur in [0, t) interval, level of search decreases with the depth of search. This is while false losses occur in (t, 0] interval. Since t = c2, the different from Beal's assumption 5. However, to eliminate former is smaller, therefore node values are on average the pathology, Pearl [1983] observed that if the error is closer to the threshold, and hence two-value error is more small, it should decrease by a factor of 1.528 every two likely. The ratio of probability of a false win : probability of levels, i.e. exponentially with the depth of search, while in a false loss is (1 ­ t) : t. If the overall probability of two- Figure 3, it decreases roughly linearly. For Pearl's value error is to remain 0.1, the appropriate settings in observation to be true, the error should be quite small, Beal's model are pd = 0.05 / t and qd = 0.05 / (1- t). Under though. For example, if the error at depth 10 is 0.1 (the these settings, the models match perfectly. value chosen by Bratko and Gams [1982] and close to the values we experimented with), Pearl's approximation gives 5 Minimax Pathology in General two-value error 0.8326 at depth 0, while the exact error is 0.3932. We chose not to work with smaller errors in this What remains to be considered is whether constant paper because move and two-value error are probabilities position error or constant two-value error at the lowest level computed from frequencies and when they are very small, of search is more realistic. There is no pathology in the very large samples are required to obtain meaningful results. former case, while the latter case corresponds to Beal's If noise introduced at the lowest level of search is basic model and is pathological. Game-playing programs adjusted so that two-value error at the lowest level of search use real (or at least multiple) values in their evaluation is always 0.1, position error at the lowest level of search functions. Position error is the most direct representation of increases with the depth of search as shown in Figure 4. the fallibility of these functions and there is no reason to believe that it should increase with the depth of search as 0.12 shown in Figure 4. But can we expect two-value error to 0.1 behave as shown in Figure 3? We will show here that the Position error 0.08 answer is "yes". Game-playing programs are generally not 0.06 concerned with two-value error; if they were, one can easily imagine that it would be large in uncertain positions whose 0.04 values are close to the threshold and small in clearly lost or 0.02 won positions far from the threshold. If both sides are to 0 have comparable chances to win at the root, the value at the 0 1 2 3 4 5 6 7 8 9 10 d root should be close to the threshold. Each level downwards from the root is one move away from the root position. Figure 4: Position error at the lowest level of search with respect Position values usually change gradually, so with each to the depth of search when two-value error at the lowest move, position values can be more different from the root level of search is always 0.1. value. Therefore the average distance of a position value from the threshold at lower levels can indeed be expected to example our model from section 3: dmax = 10 and leaf values be larger than at higher levels, as was also stated by Scheucher and Kaindl [1998]. This is illustrated in Figure 6; are distributed uniformly, therefore F10 (x) = x. Equation (7) can be used to calculate F8 (x) = 4 x2 ­ 4 x3 + x4. Figure 7 darker area represents higher probability of two-value error. shows F8 (x) and F10 (x) in our model. Figure 6: Distance of nodes values from the threshold and its relation to two-value error. Figure 7: Distribution functions of node values at levels 8 and 10 in our model from section 3. In game trees with independent node values, the effect of position error on two-value error can be analyzed As can be seen in Figure 7, F8 (x) is steeper that F10 (x) mathematically. For simplicity, we will only consider b = 2 between x = a and x = b. To determine where Fi­2 (x) is and limit node values to [0, 1] interval. steeper than Fi (x) independently of Fi (x), inequality (9) Due to space limitation, we will only examine false must be solved; note that Fi (x) is written as Fi. losses. Consider the probability of a false loss at a node with dFi - 2 dFi > true value X and heuristic value X ­ e, where e is the node's (9) dFi dFi position error. False loss means that X > t and the heuristic 8 Fi ­ 12 Fi2 + 4 Fi3 > 1 value is on the other side of the threshold, i.e. X ­ e < t. The probability for such a mistake to happen at a node whose The expression dFi­2 / dFi as a function of Fi is shown in true real value is distributed according to distribution Figure 8. function F (x) is calculated according to equation (4). P( X > t X - e < t ) = P(t < X < t + e) = F (t + e) - F (t ) (4) Consider now the probability of a false loss at different levels in a game tree. Let Fi (x) be the distribution function of node values at i-th level of the game tree. If i ­ 2 is a max level, Fi­2 (x) is calculated from Fi­1 (x) according to equation (5). Fi - 2 ( x ) = P ( X i - 2 < x ) = P ( X i -1 < x ) 2 = Fi -1 ( x) 2 (5) Figure 8: dFi­2 / dFi as a function of Fi. If i ­ 1 is a min level, Fi­1 (x) is calculated from Fi (x) according to equation (6). With the help of Figure 8 we can solve inequality (9): 0.1624 < Fi < 0.7304. Values a and b in Figure 7 are Fi -1 ( x) = P( X i -1 < x) = 1 - P ( X i -1 > x) = therefore a = F10­1 (0.1624) and b = F10­1 (0.7304); since (6) = 1 - P( X i > x) 2 = 1 - (1 - P( X i < x)) 2 = 1 - (1 - Fi ( x)) 2 F10 (x) = x, this means a = 0.1624 and b = 0.7304. So whenever the values of Fi (t + e) and Fi (t) are in the interval In order to calculate Fi­2 (x) from Fi (x) in one step, (5) between 0.1624 and 0.7304, false-loss error at level i ­ 2 is and (6) are joined into equation (7). greater than false-loss error at level i for any distribution Fi - 2 ( x) = Fi -1 ( x) 2 = (1 - (1 - Fi ( x)) 2 = function Fi (x). Since Fi (t) = ki, if ki is to be the same for all (7) i, most reasonable distribution functions should satisfy the = 4 Fi ( x) 2 - 4 Fi ( x) 3 + Fi ( x) 4 condition 0.1624 < Fi (t) < 0.7304. We will show that the probability of a static false-loss We now see that two-value error at higher levels is error at higher levels is usually greater than at lower levels, smaller than at lower levels. But is it smaller enough? If pi which is expressed by inequality (8). and qi when searching to depth dmax were computed for all i, P (false loss at level i ­ 2) > P (false loss at level i) these values could be used for pd and qd when searching to (8) Fi­2 (t + e) ­ Fi­2 (t) > Fi (t + e) ­ Fi (t) depths d < dmax. Under these conditions, two-value error at the root would be the same for searches to all depths ­ Inequality (8) means that the difference between the minimax would be neither pathological nor beneficial. If we values of distribution function at points t + e and t is greater can prove that given position error e causes greater two- at higher levels ­ in other words, that the distribution value error at level i ­ 2 if it is introduced at level i ­ 2 function is steeper at higher levels. Let us take as an mathematically that in game trees with independent node (denoted P1) than if it is introduced at level i and then values, two-value error at higher levels is also smaller than backed-up to level i ­ 2 (denoted P2), we will also have at lower levels. Furthermore, we showed that the reduction proven that two-value error as a result of depth-independent of the error is in most cases sufficient to eliminate the position error decreases from the leaves towards the root pathology. The pathology sometimes persists, particularly sufficiently to make minimax non-pathological. P1 and P2 when the error is very large, but such cases are probably are calculated using equations (4) and (7) in different order, rare. Establishing which cases exactly are these is left for resulting in equations (10) and (11). further work. P1 = Fi - 2 (t + e) - Fi - 2 (t ) = In summary, the explanation for the minimax pathology, = 4 Fi (t + e) 2 - 4 Fi (t + e) 3 + Fi (t + e) 4 - (10) or the absence of it, presented in this paper, is a necessary consequence of a real-value minimax model, and does not - ( 4 Fi (t ) - 4 Fi (t ) + Fi (t ) ) 2 3 4 require any other assumption. Even if one were to claim that P2 = 4( Fi (t + e) - Fi (t )) 2 - 4( Fi (t + e) - Fi (t )) 3 + in a purely theoretical sense, real values are inappropriate, (11) they are certainly necessary in practice. And the minimax + ( Fi (t + e) - Fi (t )) 4 pathology is considered pathological because it is at odds We solved inequality P1 > P2 on [0, 1] interval using with what is observed in practice. Mathematica software, resulting in (12). References 6 + Fi (t ) - 4 + 12 Fi (t ) - 7 Fi (t ) 2 (12) Fi (t ) Fi (t + e) [Beal, 1980] Don F. Beal. An analysis of minimax. In 4 Advances in Computer Chess 2, pages 103­109, 1980. The upper limit for Fi (t + e) from inequality (12) is Edinburgh University Press. shown in Figure 9. [Beal, 1982] Don F. Beal. Benefits of minimax search. In Advances in Computer Chess 3, pages 17­24, 1982. Pergamon Press. [Bratko and Gams, 1982] Ivan Bratko and Matjaz Gams. Error analysis of the minimax principle. In Advances in Computer Chess 3, pages 1­15, 1982. Pergamon Press. [Lustrek, 2004] Mitja Lustrek. Minimax pathology and real- number minimax model. In Proceedings of the Thirteenth International Electrotechnical and Computer Figure 9: F (t + e) as a function of F (t). Science Conference, Volume B, pages 137­140, Portoroz, Slovenia, September 2004. Slovenian Section As can be seen in Figure 9, minimax will behave IEEE. pathologically only when F (t + e) is close to 1 (shaded area). Since F (t) is generally not expected to be close to 1, [Michon, 1983] Gerrard P. Michon. Recursive Random this means a large position error. This is consistent with our Games: A probabilistic model for Perfect Information observations from section 3 where there was no pathology Games. Ph.D. thesis, University of California, Los except in some cases when static error was very large. Angeles, 1983. [Nau, 1979] Dana S. Nau. Quality of decision versus depth 6 Conclusion of search on game trees. Ph.D. thesis, Duke University, To analyze the pathology, we designed a minimax model 1979. with real-number position values. The model did not behave [Nau, 1982] Dana S. Nau. An investigation of the causes of pathologically under a wide range of settings, as long as pathology in games. Artificial Intelligence, 19(3): 257­ normally (or uniformly) distributed noise used to model the 278, November 1982. error of heuristic evaluations was independent of the depth [Pearl, 1983] Judea Pearl. On the nature of pathology in of search. However, under these settings, two-value error game searching. Artificial Intelligence, 20(4): 427­453, was not independent of the depth of search, which is July 1983. contrary to what was assumed in most of the previous research. Due to minimax relations among the true values, [Sadikov et al., 2003] Aleksander Sadikov, Ivan Bratko, and both types of error cannot be independent of the depth of Igor Kononenko. Search vs knowledge: Empirical study search simultaneously, because at greater depths, position of minimax on KRK endgame. In Advances in Computer values are on average farther away from the threshold Games: Many Games, Many Challenges, pages 33­44, separating losses from wins; therefore at greater depths, the 2003. Kluwer Academic Publishers. same position error causes smaller two-value error. In real [Scheucher and Kaindl, 1998] Anton Scheucher and games, this is caused by position values starting close to the Hermann Kaindl. Benefits of using multivalued threshold at the root and dispersing gradually as we advance functions for minimaxing. Artificial Intelligence, 99(2): downwards through the game tree. We showed 187­208, March 1998.