GGrantIndex
← Search

Matrix estimation under rank constraints for complete and incomplete noisy data

$220,263FY2011MPSNSF

Cornell University, Ithaca NY

Investigators

Abstract

The central goals of this proposal are:(a) to provide methods for the estimation of matrices of unknown rank from both completely and incompletely observed noisy matrices, using rank regularized risk minimization and (b) to establish novel oracle type risk bounds for the matrix estimates and the rank estimates, under minimal assumptions. The difficulty of the problem of recovering the underlying target matrix from an observed noisy matrix is that the number of independent parameters is large relative to the number of observations. Special attention is given to multivariate response regression models. There is an interesting resemblance between matrix estimation under low rank assumptions and estimation in general regression models under sparsity assumptions, but matrix models pose different mathematical and computational challenges. High dimensional data arranged in matrix format are increasingly common in many scientific disciplines such as genetics, medical imaging, engineering, psychology and neuroscience. The matrices containing observed data in these areas tend to have high rank due to the presence of noise, but the signal matrix underlying the data may have significantly lower rank. Ignoring this in any inferential procedure may lead to poor recovery of the target, with severe repercussions on the interpretation of the results. Instances of targets that must be recovered with the highest possible precision include: faces against background, ensembles of genes that are associated with a disease, brain structures associated with cognitive processes, to name just a few example. Some of the challenges associated with the analysis of such data can be met via the methodological and theoretical study of the problem of matrix estimation under rank constraints. A second problem, which is substantially more difficult, is to perform the same task when only partially observed noisy matrices are available. Systematic investigation of these two problems is the focus of this proposal. The usefulness of these techniques will be immediately disseminated to the scientific community by applying them to data obtained from a study of the effects of HIV on brain structure and functions. Free software that implements the developed methodology will be made available on the web in a readily implementable form.

View original record on NSF Award Search →