Benchmarking and Computational framework for Optimal Visualization and Interpretability of high-dimensional separable Data
National Institute Of Environmental Health Sciences
Investigators
Linked publications & trials
Abstract
By accounting for the uncertainty in cell state classification and dependency in multiple features of the projected metric space of the most used linear, nonlinear, and neural network data reduction methods, we propose a robust analytical framework called MUltiscale-MUltivariate-MUltilevel Benchmarking and Computational framework for Optimal Visualization and Interpretability of high-dimensional separable Data (MUBCOVID) suitable for benchmarking dimensionality reduction methods. MUBCOVID uses a multivariate metric to assess five features characterizing the interpretability of projection in terms of fidelity of a good coverage, uniform spread of the projected data, preserving structure of the original dataset, time dependency of the projected data, and number of outliers of dense clusters. Specifically, it builds a moderation-effect multilevel Bayesian model for benchmarking the accuracy of various methods derived from the correlation of the above features. Under both supervised and semi-supervised settings, MUBCOVID is used to benchmark the performance of three classes of data reduction methods, applied to visualize three different dynamic biological processes; EMT, spermatogenesis and stem cell reprogramming. Using posterior confidence intervals, we summarize the key features optimized by current methods. We also provide optimal parameter regions for good visualization and show that optimal interpretability of metric maps after data reduction is strongly confounded by time and data complexity defined in terms of dimension of feature space and number of underlying celltypes and that no current method uniquely optimizes all features. In addition,by implementing and optimizing a joint variational and contractive autoencoder (oVAE), we demonstrate how MUBCOVID not only quantifies the visualization and interpretability performance of a new data reduction method but also establishes oVAE as an optimal benchmarking method when the user is uncertain about which visualization feature to optimize. This study provides an unbiased benchmarking framework and model characterization for optimal visualization and interpretability of relationships as metric maps during dynamic biological processes after data reduction.
View original record on NIH RePORTER →