GGrantIndex
← Search

CAREER: Statistical Inference Under Information Constraints: Efficient Algorithms and Fundamental Limits

$552,654FY2019CSENSF

Cornell University, Ithaca NY

Investigators

Abstract

Data science and machine learning systems have to optimize constraints on the availability of data, computation time, memory for storage, and privacy concerns. For example, while performing web search on mobile devices, one would like the applications to be small in size, communicate as little data as possible, and leak as little about the user as possible. These constraints are often at odds with each other. A system that provides strong privacy guarantees might require more data and computation, and a system that uses little data might require more computation. A fundamental understanding of the limits and trade-offs between constrained resources such as samples, time, memory, communication, and privacy is critical for tackling the many challenges in data science that lay ahead. In spite of many success stories of data science, these trade-offs are poorly understood even in some of the simplest settings. This project aims to establish the fundamental trade-offs between these resources, as well as design efficient schemes that achieve them. The project outcomes can help design faster, communication-frugal, privacy-preserving, and space-efficient learning systems. The project seeks to involve the participation of a diverse group of researchers in this project through outreach activities that target undergraduate students and under-represented communities. The investigator will formulate and study fundamental statistical inference tasks such as distribution estimation, hypothesis testing, and distribution property estimation under the information constraints mentioned above. A particular direction of interest is the impact of the availability of shared randomness on the other constraints for distributed machine learning systems. While the role of randomness has been studied in problems in communication complexity, its role in machine learning systems is often overlooked. The project will integrate ideas from computer science, information theory, machine learning, and statistics, seeking to bridge researchers from these communities. All findings of this project will be disseminated through publications, and will be made publicly available on the investigator's website. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →