GGrantIndex
← Search

AF: SMALL: Extending the Reach of Distribution Testing via Structure

$600,000FY2023CSENSF

Massachusetts Institute Of Technology, Cambridge MA

Investigators

Abstract

We are inundated with a multitude of available data, much of which is naturally viewed as samples from a probability distribution over a large discrete domain. As there is typically no explicit description of the distribution, in order to make effective use of the data, one must develop efficient methods of determining which salient properties are held by the underlying distribution. Such distribution testing tasks are fundamental to scientific analysis, and recent years have seen a leap in our understanding of how to design such algorithms. However, the amount of sample data needed to successfully achieve many of these tasks can be prohibitively large. This project aims to develop algorithms which capitalize on known structures in the data in order to give significantly more efficient solutions. In addition, this project will develop sample-efficient methods of ascertaining whether the data indeed has the claimed structure. This project will include the organization of an annual Workshop on Local Algorithms (WOLA) and will produce publicly available educational material based on current research. The project will also include co-chairing a postdoctoral program specifically aimed at broadening participation and other mentoring activities. The project will engage with high school students in local public schools by giving presentations and serving on advisory committees. This project studies the role of structure in distribution testing. This research will lead to tools for understanding the tradeoffs between the sample complexity required to test properties of distributions and the strength of the assumptions made a priori on the distributions being tested. In a first thrust, algorithms will be developed which capitalize on known (or assumed) existing structural properties in the sample data to give significantly more efficient solutions for estimating information-theoretic quantities and determining whether the data additionally satisfies other structural properties. In a second thrust, new techniques will be developed for designing algorithms to determine whether the data in fact has the assumed structural properties. Unfortunately, for many natural structural properties, the task of testing whether data satisfies these structural properties can be expensive. Fortunately, in many important settings, it can be the case that testing whether the precise structural properties hold is unnecessary. Thus, in a third thrust, this project will consider techniques that bypass the difficulty of testing for structure by testing only that the data has enough structure to make it amenable for use in the settings for which the data is intended. Specifically, such methods allow one to safely use (possibly modified versions of) agnostic learning algorithms that depend on a weaker distributional assumption on the data. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →