Mutual Information Estimation using LSH Sampling

Mutual Information Estimation using LSH Sampling

Ryan Spring, Anshumali Shrivastava

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 2807-2815. https://doi.org/10.24963/ijcai.2020/389

Learning representations in an unsupervised or self-supervised manner is a growing area of research. Current approaches in representation learning seek to maximize the mutual information between the learned representation and original data. One of the most popular ways to estimate mutual information (MI) is based on Noise Contrastive Estimation (NCE). This MI estimate exhibits low variance, but it is upper-bounded by log(N), where N is the number of samples. In an ideal scenario, we would use the entire dataset to get the most accurate estimate. However, using such a large number of samples is computationally prohibitive. Our proposed solution is to decouple the upper-bound for the MI estimate from the sample size. Instead, we estimate the partition function of the NCE loss function for the entire dataset using importance sampling (IS). In this paper, we use locality-sensitive hashing (LSH) as an adaptive sampler and propose an unbiased estimator that accurately approximates the partition function in sub-linear (near-constant) time. The samples are correlated and non-normalized, but the derived estimator is unbiased without any assumptions. We show that our LSH sampling estimate provides a superior bias-variance trade-off when compared to other state-of-the-art approaches.
Keywords:
Machine Learning: Big data; Scalability
Data Mining: Big Data, Large-Scale Systems
Data Mining: Clustering, Unsupervised Learning
Machine Learning: Bayesian Optimization