Mutual Information Estimation using LSH Sampling

Ryan Spring; Anshumali Shrivastava

doi:10.24963/ijcai.2020/389

Mutual Information Estimation using LSH Sampling

Ryan Spring, Anshumali Shrivastava

Short video

Long video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 2807-2815. https://doi.org/10.24963/ijcai.2020/389

PDF BibTeX

Learning representations in an unsupervised or self-supervised manner is a growing area of research. Current approaches in representation learning seek to maximize the mutual information between the learned representation and original data. One of the most popular ways to estimate mutual information (MI) is based on Noise Contrastive Estimation (NCE). This MI estimate exhibits low variance, but it is upper-bounded by log(N), where N is the number of samples. In an ideal scenario, we would use the entire dataset to get the most accurate estimate. However, using such a large number of samples is computationally prohibitive. Our proposed solution is to decouple the upper-bound for the MI estimate from the sample size. Instead, we estimate the partition function of the NCE loss function for the entire dataset using importance sampling (IS). In this paper, we use locality-sensitive hashing (LSH) as an adaptive sampler and propose an unbiased estimator that accurately approximates the partition function in sub-linear (near-constant) time. The samples are correlated and non-normalized, but the derived estimator is unbiased without any assumptions. We show that our LSH sampling estimate provides a superior bias-variance trade-off when compared to other state-of-the-art approaches.

Keywords:

Machine Learning: Big data; Scalability

Data Mining: Big Data, Large-Scale Systems

Data Mining: Clustering, Unsupervised Learning

Machine Learning: Bayesian Optimization