GGrantIndex
← Search

HDR TRIPODS: Institute for Integrated Data Science: A Transdisciplinary Approach to Understanding Fundamental Trade-offs and Theoretical Foundations

$1,500,000FY2019CSENSF

University Of Massachusetts Amherst, Amherst MA

Investigators

Abstract

Many areas of science, engineering, and industry are already being revolutionized by the adoption of tools and techniques from data science. However, a rigorous analysis of existing approaches together with the development of new ideas is necessary to a) ensure the optimal use of available computational and statistical resources and b) develop a principled and systematic approach to the relevant problems rather than relying on a collection of ad hoc solutions. In particular, there are many interrelated questions that arise in a typical data science project. First is the acquisition of relevant data: Can data be collected interactively and might this reduce the costs of data acquisition? Is the data noisy and how might this impact the results? Second is the processing of data: If the data cannot fit in the memory of a single machine, how can we minimize the communication costs within a cluster of machines? When are approximate answers sufficient and how does the required accuracy trade off with the computational resources available? Third is the prediction value of the available data: Can the uncertainty of the final results be quantified? How can the modeling assumptions used by our algorithms be efficiently evaluated? This award supports a data science institute with the main goal of developing an understanding of the fundamental mathematical and computational issues underlying the aforementioned questions. Ultimately, this will enable practitioners to make more informed decisions when investing time and money across the life cycle of their data science project. Achieving this goal necessitates a transdisciplinary approach and the team of investigators includes experts in theoretical computer science; applied and computational mathematics; machine learning and statistics; and coding and information theory. In addition to pursuing the above research goals, the institute will coordinate education and training activities and develop resources for the research community. Specific research goals explored in this project include: 1) Understanding the trade-off between rounds of interactive data acquisition and statistical and computational efficiency. 2) Minimizing query complexity in interactive unsupervised learning problems. 3) Understanding space/sample complexity trade-offs when processing stochastic data. 4) Developing fine-grained approximation algorithms relevant to core data science tasks. 5) Using coding theory to enable communication-efficient distributed machine learning. 6) Designing variational inference methods with statistical guarantees given limited resources. 7) Developing a principled approach to exploiting trade-offs between bias, model complexity, and computational budget. Specific institute activities include: 1) Technical workshops and training activities for researchers in domain sciences. 2) A virtual speaker series. 3) Education initiatives including the development of new courses that will teach foundational topics in data science and resources that can be used across different institutions. The grant will also train postdoctoral scholars and undergraduate researchers. This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →