CAREER: From Rare Events to Competitive Learning Algorithms

$545,722FY2022CSENSF

University Of Illinois At Chicago, Chicago IL

Investigators

Abstract

A sense of optimism underlies the mass adoption of data-driven algorithms. Such algorithms will be used not only to match current scientific and engineered solutions but also, given enough data, to rival years of future effort. This project proposes framing optimism in terms of competitive learning, where a single algorithm performs as well as a family of bespoke algorithms, each for a specific true nature. Thus, even without knowledge of the truth, a competitive algorithm is guaranteed to deliver the best feasible performance in each case. This approach shows that much more is possible than the pessimistic outlook of judging performance against a worst-case nature. It can also help better understand the limits of (human) discovery, based on how amenable nature is to being discovered. By cutting right to the heart of the philosophy of acquiring knowledge from observation, this work weaves research with data science education efforts at several levels, to help nurture and train the next generation of data scientists and engineers. The project fully explores the competitive perspective for contextual distribution learning, a problem that permeates data science and machine learning, with natural language modeling as a specific use case. Here, the goal is to demonstrate an algorithm that learns to predict, by building a stochastic matrix or tensor from data. It is successful if its performance rivals that of a rich array of specialized algorithms aware of potential underlying structures, such as rank, sparsity, or low manifold dimension, despite no single algorithm being worst-case optimal. The larger premise of the project is that the principles that enable competitiveness are fundamental and cut across multiple domains. These include: (i) the back-off principle, which demarcates between abundant and rare data regions and applies structure only in the latter, where it is needed, achieving competitiveness through nimbleness, (ii) the empirical-Bayes principle, which offers a mechanism to share data across rarely represented regions, allowing them to help each other, and (iii) tail structures, which ground the first two principles in a tractable framework and can be used to establish fundamental limits, by characterizing conditions that are necessary and sufficient for competitiveness. By rigorously establishing the foundations of these principles, the goal of the project is to streamline the design of competitive algorithms that simultaneously find the truth of nature and adapt to it. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →