Fitting Manifolds to Noisy Data
University Of Washington, Seattle WA
Investigators
Abstract
In recent years, high dimensional statistics has focused on methods of alleviating the curse of dimensionality, which refers to various inconvenient phenomena that arise when analyzing data in high-dimensional spaces. One assumption that facilitates this is the Manifold hypothesis: that data lie in the vicinity of a low dimensional manifold. For example, if we randomly photograph a large number of objects, randomly choose three-by-three pixel patches, and plot the corresponding points in nine-dimensional space, we approximately see a Klein bottle, which is a two dimensional manifold. This could be due to the generating process possessing only a few essential degrees of freedom. Manifold learning is a collection of methodologies for analyzing high dimensional data based on the Manifold hypothesis. This has been an area of intense activity over the past two decades. In work with Fefferman and Mitter, the foundation for a novel approach to Manifold learning has been established. The present project is concerned with using this approach for the building of practically and theoretically efficient algorithms to fit a manifold to data, given that the data lie near a manifold. Current approaches which attempt to build a manifold with some guarantees of success, either forgo the smoothness assumption or are restrictive in some way. In the work with Fefferman and Mitter new techniques based on fiber bundles have been developed and formally justified to verify in finite time whether there is a manifold of bounded reach lying near the data. Since this is an exhaustive process, the algorithm is extremely slow. However, when the data is actually from a manifold with perhaps some corruption by noise, our approach has sufficiently powerful techniques which can be modified and adopted to construct an approximation of the manifold in near linear time on the size of the data. The feasibility of the method has been tested experimentally through implementation of the algorithm on simple examples. Theoretical guarantees of speed will be provided for these algorithms as part of this project and the algorithms will be shown to be practical. At a broader level, we would like to introduce ideas from Whitney interpolation to statistics and engineering.
View original record on NSF Award Search →