Learning from Partial Observations
We present a general machine learning framework for modelling the phenomenon of missing information in data. We propose a masking process model to capture the stochastic nature of information loss. Learning in this context is employed as a means to recover as much of the missing information as is recoverable. We extend the Probably Approximately Correct semantics to the case of learning from partial observations with arbitrarily hidden attributes. We establish that simply requiring learned hypotheses to be consistent with observed values suffices to guarantee that hidden values are recoverable to a certain accuracy; we also show that, in some sense, this is an optimal strategy for achieving accurate recovery. We then establish that a number of natural concept classes, including all the classes of monotone formulas that are PAC learnable by monotone formulas, and the classes of conjunctions, disjunctions, k-CNF, k-DNF, and linear thresholds, are consistently learnable from partial observations. We finally show that the concept classes of parities and monotone term 1-decision lists are not properly consistently learnable from partial observations, if RP ≠ NP. This implies a separation of what is consistently learnable from partial observations versus what is learnable in the complete or noisy setting.