CIF: Small: Information Theory Meets Deep Learning: Universal Probability and Common Information for High-Dimensional Data

$500,000FY2019CSENSF

University Of California-San Diego, La Jolla CA

Investigators

Abstract

Information theory is a scientific discipline that studies how well we can communicate, compress, and otherwise process various signals and data. Over the past 70 years, the tenets of information theory have driven the digital transformation of our daily lives, with ubiquitous and reliable communication of 0s and 1s over wireless networks as an indispensable operation in diverse public and private enterprises. Beyond such "bits," however, conventional methods in information theory often encounter several limitations when dealing with real-world data of multiple dimensions, large alphabets, and complex spatial dependence. This project aims to develop new mathematical tools and engineering techniques for handling such complicated real-world data by incorporating deep learning into the tenets of information theory. In this synergistic combination, information theory provides performance guarantees and a systematic decomposition of complex problems. Deep learning, in turn, provides working procedures for efficient processing of high-dimensional data. This project explores two concrete research directions in which the combination prompts new data science frameworks. The first direction builds on the notion of universal probability, which is a close proxy to the unknown distribution of the data. Although conventional algorithms based on universal probability converge rather slowly for high-dimensional, large alphabet, long-range dependent data, the proposed framework leverages deep neural networks to efficiently learn the universal probability distribution across multiple data contexts by aggregating them smoothly. The resulting combination of information theory and deep learning provides a new paradigm to address general data science problems in a principled yet pragmatic manner, with theoretical performance guarantees and practical performance improvements. The second direction is inspired by the notion of common information in information theory, and develops a new neural network model for statistical inference among multiple high-dimensional data sets (for example, drawing a picture automatically that describes a given text the best). This model and associated training algorithms can be applied to various data processing tasks such as joint and conditional generation as well as supervised or semisupervised learning of high-dimensional output data. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →