See without looking: joint visualization of sensitive multi-site datasets

See without looking: joint visualization of sensitive multi-site datasets

Debbrata K. Saha, Vince D. Calhoun, Sandeep R. Panta, Sergey M. Plis

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 2672-2678. https://doi.org/10.24963/ijcai.2017/372

Visualization of high dimensional large-scale datasets via an embedding into a 2D map is a powerful exploration tool for assessing latent structure in the data and detecting outliers. There are many methods developed for this task but most assume that all pairs of samples are available for common computation. Specifically, the distances between all pairs of points need to be directly computable. In contrast, we work with sensitive neuroimaging data, when local sites cannot share their samples and the distances cannot be easily computed across the sites. Yet, the desire is to let all the local data participate in collaborative computation without leaving their respective sites. In this scenario, a quality control tool that visualizes decentralized dataset in its entirety via global aggregation of local computations is especially important as it would allow screening of samples that cannot be evaluated otherwise. This paper introduces an algorithm to solve this problem: decentralized data stochastic neighbor embedding (dSNE). Based on the MNIST dataset we introduce metrics for measuring the embedding quality and use them to compare dSNE to its centralized counterpart. We also apply dSNE to a multi-site neuroimaging dataset with encouraging results.
Keywords:
Machine Learning: Data Mining
Machine Learning: Machine Learning
Machine Learning: Unsupervised Learning
Machine Learning: New Problems