CIF: Medium: Collaborative Research: Learning in High Dimensions: From Theory to Data and Back

$597,629FY2016CSENSF

Stanford University, Stanford CA

Investigators

Abstract

Statistical-modeling is the cornerstone of analyzing modern data sets, and using observed data to learn the underlying statistical model is a crucial part of most data analysis tasks. However, with the success of data utilization came a vast increase in its complexity as expressed in complex models, numerous parameters, and high dimensional features. This research project studies problems in learning such high-dimensional models, both in theory and in practice with actual datasets in cutting-edge applications. Learning high-dimensional models efficiently, both in terms of computation and in terms of the use of the data, is an important challenge. The research characterizes the fundamental limits on the sample and computational complexity of several key distribution learning problems, as well as the associated optimal learning algorithms that achieve the limits. The learning problems underpin important tasks such as clustering, multiple testing of hypothesis and information measure estimation. The new algorithms and new methodologies developed are evaluated and applied on real data from three specific applications: 1) denoising of high throughput transcriptomic data; 2) analysis of omics data for personalized medicine; 3) ecological population studies. While these applications are useful on their own right, there will also be many other potential applications in fields such as speech recognition, topic modeling, character recognition, neuroscience, etc.

View original record on NSF Award Search →