High-Dimensional Covariance Estimation via Convex Optimization

$120,001FY2014MPSNSF

Cornell University, Ithaca NY

Investigators

Abstract

Modern technologies allow researchers to measure an unprecedentedly large number of attributes regarding the subjects of their study. An important question in many applications is how one can infer from such data the underlying relationships between these attributes. Such an objective can be expressed through a fundamental construct in the field of statistics known as the covariance matrix. Using classical statistical techniques would require one to collect data on a prohibitively large number of subjects to reliably estimate this matrix. In this work, novel statistical methods will be developed that allow researchers to make sound inferences by making better use of their data given the limited number of subjects they have available. The methods developed will be applicable in a wide range of fields. For example, in biology, one can infer the structures of massive networks of genes based on a small number of samples. Beyond being an end in itself, the covariance matrix is a key ingredient in many common statistical procedures. Thus, by developing the ability to reliably estimate it from small numbers of subjects, this work will enable the use of many other methods that would otherwise be unavailable to researchers. Application areas include disease diagnosis, basic biology, sensor networks, and social networks. The research program will focus on high-dimensional covariance estimation and capitalize on the strengths of the convex framework to develop novel statistical methodology. This work will involve developing efficient algorithms, thoroughly investigating the properties of estimators and algorithms through a combination of theory and simulation, and applying methods to real datasets. The research focus is in two main areas: (A) In certain applications, the variables have a known ordering. Such structure suggests the use of a convex penalty not previously applied to covariance estimation. This work will carefully study using such a penalty to estimate both the covariance matrix and the inverse covariance matrix. (B) Estimating a covariance matrix as a simultaneously sparse and positive definite matrix is a natural goal, and yet the standard penalized likelihood approach is not convex. This research will develop convex-optimization-based estimators that still make use of the likelihood. For all projects, software will be produced, made freely available online, and maintained so that other researchers can benefit from its use.

View original record on NSF Award Search →