BIGDATA: F: DKA: Learning a Union of Subspaces from Big and Corrupted Data

$608,530FY2014CSENSF

Johns Hopkins University, Baltimore MD

Investigators

Abstract

This project develops theory and algorithms for automatically discovering multiple low-dimensional structures in high-dimensional data, and evaluates these algorithms in image clustering applications. The developed techniques enhance our ability to handle big data problems from multiple sources and modalities, and advance the knowledge on how to interpret massive amounts of complex high-dimensional data. The techniques developed in this project can significantly broaden the applicability of existing results in sparse representation theory to subspace clustering problems, which have found widespread applications in image processing (e.g., image denoising, compression, representation, and segmentation), computer vision (e.g., motion segmentation and face clustering) and dynamical systems (e.g., hybrid system identification). This research develops provably correct and scalable algorithms for learning a union of low-dimensional subspaces from big and corrupted data. The algorithms are based on the so-called self-expressiveness property of the data, which states that an uncorrupted data point can be well approximated by an affine combination of other uncorrupted data points. This research shows that by imposing a structured sparse and low-rank prior on the coefficients, one can discover multiple structures in the data. In the case of uncorrupted data, the research team studies conditions on the data under which a perfect clustering is possible. In the case of data corrupted by outliers, the research team studies conditions under which perfect clustering and outlier rejection are possible. In the case of data with missing entries, the research team studies conditions under which perfect clustering and data completion are possible. The project also develops efficient and scalable algorithms that benefit from distributed and high-performance computing for solving the various subspace clustering problems. These algorithms enable solving large-scale problems in computer vision, including image clustering.

View original record on NSF Award Search →