Proceedings Abstracts of the Twenty-Fourth International Joint Conference on Artificial Intelligence

Statistical Regimes and Runtime Prediction / 318
Barry Hurley, Barry O'Sullivan

The last decade has seen a growing interest in solver portfolios, automated solver configuration, and runtime prediction methods. At their core, these methods rely on a deterministic, consistent behaviour from the underlying algorithms and solvers. However, modern state-of-the-art solvers have elementsof stochasticity built in such as randomised variable and value selection, tie-breaking, and randomised restarting. Such features can elicit dramatic variations in the overall performance between repeated runs of the solver,often by several orders of magnitude. Despite the success of the aforementioned fields, such performance variations in the underlying solvers have largely been ignored. Supported by a large-scale empirical study employing many years of industrial SAT Competition instances including repeated runs, we present statistical and empirical evidence that such a performance variation phenomenon necessitates a change in the evaluation of portfolio, runtime prediction, and automated configuration methods. In addition, we demonstrate that this phenomenon can have a significant impact on empirical solver competitions. Specifically, we show that the top three solvers from the 2014 SAT Competition could have been ranked in any permutation. These findings demonstrate the need for more statistically well-founded regimes in empirical evaluations.