Bias in Algorithm Portfolio Performance Evaluation / 712
Chris Cameron, Holger H. Hoos, Kevin Leyton-Brown
A Virtual Best Solver (VBS) is a hypothetical algorithm that selects the best solver from a given portfolio of alternatives on a per-instance basis. The VBS idealizes performance when all solvers in a portfolio are run in parallel, and also gives a valuable bound on the performance of portfolio-based algorithm selectors. Typically, VBS performance is measured by running every solver in a portfolio once on a given instance and reporting the best performance over all solvers. Here, we argue that doing so results in a flawed measure that is biased to reporting better performance when a randomized solver is present in an algorithm portfolio. As long as there is more than 1 solver in the portfolio, a single randomized solver can cause VBS bias. Specifically, this flawed notion of VBS tends to show performance better than that achievable by a perfect selector that for each given instance runs the solver with the best expected running time. We report results from an empirical study using solvers and instances submitted to several SAT competitions, in which we observe significant bias on many random instances and some combinatorial instances. We also show that the bias increases with the number of randomized solvers and decreases as we average solver performance over many independent runs per instance. We propose an alternative VBS performance measure by (1) empirically obtaining the solver with best expected performance for each instance and (2) taking bootstrap samples for this solver on every instance, to obtain a confidence interval on VBS performance. Our findings shed new light on widely studied algorithm selection benchmarks and help explain performance gaps observed between VBS and state-of-the-art algorithm selection approaches.