GGrantIndex
← Search

EAGER-DynamicData: Principled and Scalable Probabilistic Frameworks for Dynamic Multi-modal Data

$100,000FY2015ENGNSF

Duke University, Durham NC

Investigators

Abstract

Emergence of the Big Data phenomenon has given rise to data collections that are massive, highly heterogeneous and multi-modal, dynamically evolving, as well as incomplete, noisy and imprecise. These characteristics are becoming increasingly prevalent in data from a diverse range of domains, such as robotics, cognitive neuroscience, sensor generated data (e.g., in geoscience and remote sensing), and the dynamically evolving data on the web. The heterogeneity, complexity, dynamic evolution, and the often real-time processing requirements, call for methods that are both statistically rigorous as well as computationally scalable. Moreover, performing fast feature-extraction and/or predictions at *test time* is another key requirement, especially in problems involving dynamic data arriving at high speeds. This project will innovate on scalable statistical methods for learning from such massive dynamic multi-modal data, with a focus on designing novel probabilistic models for multi-layer latent feature extraction for such data. These multi-layer latent feature representations of the data will help capture the underlying dynamics and allow reconciling the data heterogeneity arising due to diverse data types and widely differing spatial and temporal resolutions across the different modalities, while also being useful for a wide range of fundamental data analysis tasks, such as classification, clustering, and predicting missing data. At the same time, the focus will also be on developing methods that are efficient at test time, so that fast feature extraction and predictions can be made in real time, to make these methods readily applicable to dynamic streaming data. This EArly Grant for Exploratory Research (EAGER) project endeavors to move beyond existing ad hoc approaches currently used for these problems, and develop a probabilistically grounded, statistically rigorous, and computationally scalable framework, based on Bayesian and nonparametric Bayesian modeling. Taking a Bayesian generative modeling approach will naturally enable modeling the dynamic behavior of the data and seamlessly integrate diverse types of data, while handling issues such as missingness, noise and the imprecise nature of the data. In addition, the nonparametric Bayesian treatment will provide the much-needed modeling flexibility and address many of the limitations of the existing Deep Learning models, e.g., by doing away with the need of extensive hand-tuning, incorporating rich prior knowledge about the model parameters, and allowing a natural sharing of statistical strength across the multiple data modalities. To handle the associated computational challenges, the framework will provide novel inference machinery in form of online Bayesian inference methods that will naturally handle dynamic, real-time data, and parallel and distributed Bayesian inference methods to handle massive multi-modal data that are too large for the capacity (storage and/or computational) of a single computing node. Furthermore, due to its inherent ability of quantifying model uncertainty, the proposed Bayesian framework will naturally facilitate a dynamic integration between model computation (inference) and data acquisition, and help design informed data acquisition (i.e., "active" sensing) methods in the context of dynamic multi-modal data. An overarching goal of this project is to also help synergize two important research directions in machine learning - nonparametric Bayesian methods and Deep Learning methods. By designing scalable nonparametric Bayesian solutions to the type of problems Deep Learning methods have been applied for, the project will convince the skeptics of Deep Learning methods to adopt these methods more openly. At the same time, the compelling range of problems and applications Deep Learning are being used for, will broaden the appeal of nonparametric Bayesian methods from a practical sense. We expect this synergy between these two areas will significantly advance the state-of-the-art in both areas.

View original record on NSF Award Search →