GGrantIndex
← Search

EAGER: Nonparametric Machine Learning on Sets, Functions, and Distributions

$200,000FY2012CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

Most machine learning algorithms operate on fixed dimensional feature vector representations. In many applications, however, the natural representation of the data consists of more complex objects, for example functions, distributions, and sets, rather than finite-dimensional vectors. This project aims to develop a new family of machine learning algorithms that can operate directly on these complex objects. The key innovation is efficient estimation of certain information theoretic quantities for learning predictive models from complex data. The research is organized around three specific aims: (a) Development and analysis of nonparametric estimators for certain important functionals of densities, such as entropy, mutual information, conditional mutual information, and divergence; and study of the theoretical properties of these estimators including consistency, convergence rates of the bias and variance, and asymptotic normality. (b) Use of the preceding estimators to design new learning algorithms for clustering, classification, regression, and anomaly detection that work directly on sets, functions, and distributions without any additional, hand-made feature extraction, histogram creation, or density estimation steps that could lead to loss of information. (c) Study of the theoretical properties of these new machine learning algorithms (computation time, sample complexity, generalization error) and empirical evaluation of the algirithms them to a variety of important real-world problems, including nuclear detection astronomical data analysis, and computer vision in collaboration with researchers at Lawrence Livermore, University of Washington and Johns Hopkins University, and Carnegie Mellon University respectively. Broader Impact. The project, if successful, could substantially advance the current state-of-the-art in building predictive models from complex data. The results of research, including publications and open source software, will be freely disseminated to the larger scientific community. The project provides enhanced research-based training opportunities for graduate and undergraduate students at Carnegie Mellon University as well as the collaborating institutions.

View original record on NSF Award Search →