When Does Label Propagation Fail? A View from a Network Generative Model

When Does Label Propagation Fail? A View from a Network Generative Model

Yuto Yamaguchi, Kohei Hayashi

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 3224-3230. https://doi.org/10.24963/ijcai.2017/450

What kinds of data does Label Propagation (LP) work best on? Can we justify the solution of LP from a theoretical standpoint? LP is a semi-supervised learning algorithm that is widely used to predict unobserved node labels on a network (e.g., user's gender on an SNS). Despite its importance, its theoretical properties remain mostly unexplored. In this paper, we answer the above questions by interpreting LP from a statistical viewpoint. As our main result, we identify the network generative model behind the discretized version of LP (DLP), and we show that under specific conditions the solution of DLP is equal to the maximum {\it a posteriori} estimate of that generative model. Our main result reveals the critical limitations of LP. Specifically, we discover that LP would not work best on networks with (1) disassortative node labels, (2) clusters having different edge densities, (3) non-uniform label distributions, or (4) unreliable node labels provided. Our experiments under a variety of settings support our theoretical results.
Keywords:
Machine Learning: Relational Learning
Machine Learning: Semi-Supervised Learning