CRII: CIF: Learning with Memory Constraints: Efficient Algorithms and Information Theoretic Lower Bounds
Cornell University, Ithaca NY
Investigators
Abstract
The trade-offs between resources such as the amount of data, the amount of storage, computation time for statistical estimation tasks are at the core of modern data science. Depending on the setting, some of the resources might be more valuable than others. For example, in credit analysis and population genetics, the amount of data is vital. For applications involving mobile devices, sensor networks, or biomedical implants, the storage available is limited and is a precious resource. This project aims to advance our understanding of the trade-offs between the amount of storage and the amount of data required for statistical tasks by (i) designing efficient algorithms that require small space and (ii) establishing fundamental limits on the storage required for these tasks. The research is at the intersection of streaming algorithms, which is primarily concerned with storage requirements of algorithmic problems, and statistical learning, which studies data requirements for statistical tasks. The investigators formulate basic statistical problems under storage constraints. The specific questions include entropy estimation of discrete distributions, a canonical problem that researchers from various fields including statistics, information theory, and computer science have studied. The paradigm of interest is the following: while the known sample-efficient entropy estimation algorithms require a lot of storage, it might be possible to reduce the storage requirements drastically by taking a little more than the optimal number of samples. The complementary side of the problem is purely information theoretic. In it, the researchers expect to develop general lower bounds that can be used to prove fundamental limits on the storage-sample trade-offs.
View original record on NSF Award Search →