CRII: CIF: Generalizations for Matrix and Tensor Estimation
Cornell University, Ithaca NY
Investigators
Abstract
Matrix and tensor estimation are core building blocks in data science and machine learning for dealing with missing data. They have been used widely across many domains, including social computing, computer vision, and computational biology. This project seeks to advance these techniques to handle model variations that are seen in real world datasets; in particular, data collection is rarely uniform, and there is often a mix of interaction data and covariate or feature information. As an example, a biological dataset might contain known properties of individual genes, in addition to information about how genes interact. The interaction data is collected from real experiments and thus may be highly non-uniformly distributed. The techniques developed from this project could enable more efficient predictions over this dataset given less experimental data. The technical goals of this project involve generalizing matrix and tensor estimation theory and algorithms beyond uniform sampling models, and designing optimally efficient algorithms that incorporate side information together with matrix interaction data. The approach proposed focuses on similarity based collaborative filtering algorithms. For each of these model variations, the researchers plan to characterize information theoretic thresholds and minimax optimal estimation error rates, design and analyze computationally and statistically efficient algorithms, and provide confidence sets to quantify uncertainty of estimates. These results will greatly increase the flexibility of matrix and tensor estimation methods to be used for sequential decision making and high dimensional scientific data analyses. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →