Anticipatory Troubleshooting / 3133
Netantel Hasidi, Roni Stern, Meir Kalech, Shulamit Reches
Troubleshooting is the process of diagnosing and repairing a system that is behaving abnormally. Diagnostic and repair actions may incur costs, and traditional troubleshooting algorithms aim to minimize the costs incurred until the system is fixed. We propose an anticipatory troubleshooting algorithm, which is able to reason about both current and future failures. To reason about failures over time, we incorporate statistical tools from survival analysis that enable predicting when a failure is likely to occur. Incorporating this prognostic information in a troubleshooting algorithm enables (1) better fault isolation and (2) more intelligent decision making in which repair actions to employ to minimize troubleshooting costs over time.