CRII: CIF: RUI: Exploiting Geometry in Robust Signal Processing and Feature Extraction

$174,263FY2020CSENSF

Salisbury University, Salisbury MD

Investigators

Abstract

In any quantitative endeavor, one must make measurements. One learns early and often that these measurements can have, depending on the source, varying degrees of reliability. Typically, these inaccurate measurements form a notion of "noise". Gathering billions of data observations per second, from millions of different sources, creates a myriad of problems; this project seeks to develop new ways to identify and counteract noise and the resulting instability in numerical and statistical tools. Such identification and extraction of systematic outliers from the data observations can be useful in applications of cyber-security, networking, and privacy. The project motivates these techniques through the lens of several existing problem frameworks; these have been successful in practice and in theory for decades, but still have shortcomings in the presence of systematic outliers. The tools developed during this project could be employed by virtually any practitioner of data science or machine learning. The ideas, specifically dealing with outlier detection and rejection, also have relevance to communication, medicine, geology, and social choice. It has been shown recently that such very noisy models can be handled by techniques that take a geometric perspective of the data, and impose structural constraints that are not typically realized in the real-world, but preserve enough features of the data so that classical notions of data processing and statistical analysis may be applied. In this project, new practical algorithms will be developed for feature extraction that use the floating body as a measure of data dispersion. This geometric structure has been proven to capture enough structure of a data set that one can still accurately extract important structural features, while allowing for significant portions of the data to be corrupt. This tool enables classical algorithms such as Independent Component Analysis (ICA) and Principal Component Analysis (PCA) to operate on data that has heavy-tailed noise, which would otherwise cause numerical instability. Further, the project will use the Generalized Central Limit Theorem (GCLT) to identify more general assumptions that can allow one to take advantage of robust statistical tools and optimization techniques. The flexibility of this is that one can use the GCLT to prove convergence of numerical algorithms when the underlying model itself does not exactly fit the parameters of robust estimation routines. Finally, the project will involve furthering applications of using ICA as a tool in algorithmic reductions for data analysis. It has already been shown to enable efficient learning algorithms, and demonstrating that some geometric learning problems (e.g. learning halfspace intersections) can be tackled by statistical techniques. This nicely complements the above research directions, which aim to break algorithmic barriers in statistical problems by using primarily geometric tools. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →