GGrantIndex
← Search

CAREER: AF: Giving Form to Data with a Geometric Scaffold

$505,046FY2018CSENSF

University Of Texas At Dallas, Richardson TX

Investigators

Abstract

Using geometry to find structure in data is an old idea (Plato is quoted as saying, "God ever geometrizes") that gains fresh application as our data changes. Today's data sets, from areas such as machine learning, are often massive and high-dimensional: for example, when trying to classify news articles, each article may be represented as a point where the frequency of each word is a different dimension. This leads to geometries that are hard to grasp intuitively and hard to work with computationally (the "curse of dimensionality"). Finding a smaller and lower dimensional subset of the data points that approximately preserves geometric structure not only reduces computation time but also can improve results by suppressing extraneous features. Representing inter-point distances of a general space in a more structured space can support new operations, such as data visualization by mapping the 2-D plane of the screen, or more efficient computation by mapping to the hierarchical structure of a tree. This project takes the age-old practice of teasing out geometric structure and applies it to the large- and high-dimensional data sets of the modern world. By taking a geometric approach to foundational problems in areas such as big data and machine learning, this project seeks to more closely connect computational geometry and these other areas, in turn both modernizing the classical field of computational geometry and advancing these other areas. The educational goals of this project will be achieved by directly supporting student research on the outlined topics, incorporating topics into developing new courses, and organizing regular seminars in order to grow the visibility and interdisciplinary nature of algorithms and theoretical computer science at the awardee institution. Given a data set, the goal is to specify its geometric structure, use this structure to summarize and embed into simpler spaces where computations can be done efficiently, and when this is not possible, identify how to minimally fix the data to facilitate these tasks. The project's focus is on three interrelated topics concerning geometric structure that lie at the intersection of big data, geometry, and machine learning: 1) data factorization and sparsification, 2) metric embeddings for structured spaces, and 3) metric violation distance. The ultimate purpose is to develop better algorithms for handling data, ranging from better clustering algorithms to better classification algorithms. The ubiquity of such algorithms implies that any progress has the potential for significant real world impact. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →