GGrantIndex
← Search

BIGDATA: F: Data Driven Optimization on Flag Manifolds with Geometric Constraints

$599,999FY2016CSENSF

Colorado State University, Fort Collins CO

Investigators

Abstract

This research concerns the development of innovative mathematical theory and algorithms to facilitate knowledge discovery in the massive data sets generated by scientists, engineers and today's data driven society. New approaches will be developed that permit the encoding of large quantities of data in a way that enables the detection of similarities and differences buried in the volumes of information. The framework is especially useful for characterizing degrees of similarity, and discovering features or patterns that may be shared between data sets. The project focuses on the use of tools from geometry and optimization to provide effective data representations that expand the toolkit of analysts and enhances their capacity for understanding large and complex data sets. The methodology will be validated on real world data sets like extreme weather simulations or biological data sets such as those capturing the human immune response to infection by pathogens. The techniques being developed may be viewed as part of the emerging field of geometric data learning. The mathematical approach exploits the geometric framework of the Grassmannian, the manifold that parameterizes the set of subspaces of a given dimension of a vector space. The appeal of this approach is that subspaces, as abstract points on the Grassmann manifold, are an effective tool to capture the natural variability in data observations stemming from, for example, variations in illumination, or noise. If a subspace of data intersects another subspace of data in some prescribed number of dimensions, then these abstract points should be considered to be more related than subspaces that intersect in fewer dimensions, or not at all. This type of geometric picture, when formulated in a mathematical framework, leads to the use of flag manifolds and Schubert varieties for representing and comparing data. The proposed research program addresses new problems in data driven optimization subject to geometric constraints, for example, when the feasible set is a Schubert variety. This framework allows us to extract geometric models that characterize patterns, and leads naturally to comparisons between large sets of observations based on similarity measures which are functions of angles between subspaces.

View original record on NSF Award Search →