CAREER: New Representations of Probability Distributions to Improve Machine Learning --- A Unified Kernel Embedding Framework for Distributions
Georgia Tech Research Corporation, Atlanta GA
Investigators
Abstract
Computational intelligence touches our lives daily. Web searches, weather prediction, detecting financial fraud, medicine and education benefit from this ubiquitous technology. Problems in computational intelligence such as image classification and predicting properties of new materials produce copious amounts of high-dimensional, complex data. Many algorithms in computational intelligence rely on probability distributions, and such data can carry unusual distributions that challenge traditional methods of modeling. (For example, they are typically not textbook distributions such as the Gaussian.) In some applications, the data input to the algorithms are themselves probability distributions. Existing techniques are cannot both capture unusual distributions and scale to millions of data points without stalling the computation. There is a pressing need for a flexible, efficient framework for representing, learning, and reasoning about datasets arising from these problems. This project will address these challenges by developing a novel and unified framework to represent and model, learn, and use probability distributions in computational intelligence. To evaluate the utility of the new techniques, the project will test them on difficult real-world problems in computer image analysis, materials science, and flow cytometry (a biotechnology technique used for cell counting, cell sorting, and protein engineering). The project, an NSF CAREER award, will integrate the research results with several education intiatives. New curricula will be designed for both undergraduate and graduate students, with empahsis on students from under-represented groups. A new online course will be created to make the results accessible to massive online masters students. Finally, advanced high school math teachers will be engaged to design problems related to the reserach for use in a math competition for advanced high school students. This project will (1) create a novel and unified nonparametric kernel framework for distributional data and distributions with fine-grained statistical properties, and (2) develop principled and scalable algorithms for nonparametric analysis of big data. The unified kernel embedding framework will advance large scale nonparametric data analysis significantly, and play an important synergistic role in bridging together traditionally separate research areas in data analysis, including kernel methods, graphical models, optimization, nonparametric Bayesian methods, functional analysis and tensor data analysis. In addition to advances in algorithmic methods, the applications to large-scale image classification, flow cytometry, and materials property prediction have the potential for transformative impact on society.
View original record on NSF Award Search →