CAREER: A Stable Foundation for Trustworthy Data Analysis

$508,000FY2018CSENSF

Northeastern University, Boston MA

Investigators

Abstract

Every day, massive amounts of data are collected, analyzed, and used to make high-stakes decisions, raising many questions about how to use this data in a trustworthy manner. This project is about two such questions: (1) How can researchers prevent false discovery, and use data to learn meaningful facts about a population without overfitting to that data? Despite decades of research into methods for preventing false discovery, it remains a vexing problem for the scientific community. (2) How can researchers use valuable but sensitive data to learn about a population without compromising the privacy of individuals in that data? This task has proven to be quite delicate, and there have been several high profile attacks on supposedly anonymous datasets, causing a lack of confidence in the most commonly used approaches. Although they may seem unrelated, surprisingly, both of these questions can be addressed using stable algorithms---algorithms that are insensitive to small changes in their inputs. In the past decade, differential privacy emerged as a strong form of algorithmic stability that guarantees a high degree of individual privacy, yet admits highly accurate data analysis. More recently, differential privacy has been shown to prevent false discovery in interactive data analysis---the common scenario where the same dataset is analyzed repeatedly, which has been implicated in a "statistical crisis in science." This project will take a unified approach to advancing the state-of-the-art in privacy and false discovery via algorithmic stability. The main outcomes of this project will be building the theoretical foundations of interactive data analysis, developing new computationally efficient stable algorithms for central problems in these areas, understanding the limits of privacy and interactive data analysis both in theory and in practice, and broadening the reach of algorithmic stability to address other challenges in trustworthy data analysis.

View original record on NSF Award Search →