CAREER: Foundations of Small Data

$549,053FY2022CSENSF

University Of Pennsylvania, Philadelphia PA

Investigators

Abstract

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2). Deep Learning is a technique where one builds an artificial neural network that mimics the working of neurons in the biological brain. This technique drives a wide range of tasks today, e.g., predicting the next word while typing on the phone, tagging photos with the names of people in it, transcribing speech, etc. Building the artificial network requires collecting a large amount of data from each of these tasks. But as we seek to apply deep learning to more and more diverse tasks, it is becoming difficult to collect such large amounts of data from every task. For example, a number of languages or dialects have much fewer speakers than English or Spanish, and so their data is more scarce. This data scarcity is even more acute in domains such as the clinical sciences. The goal of this project is to develop theoretical and computational tools that enable artificial neural networks to work well even with few data. Educational and outreach goals of this project include (a) development of new curricula for graduate and undergraduate students, (b) mentoring trainees who work across established disciplines such as computer science, physics and engineering, and (c) fostering an ecosystem for machine learning across high-schools, higher-educational institutions and industry in the Greater Philadelphia region. In order to achieve these goals, this project will develop a foundational understanding of learning tasks. It will study how typical learning tasks have a certain effective low-dimensional structure that enables deep networks to learn such tasks efficiently. It seeks to characterize the geometry of the function space of predictive models fitted on typical tasks to understand when learning one task helps, or does not help, reduce the amount of data required to learn another task. It aims to exploit this geometry to build Bayesian priors that automatically adapt to the amount of available data. It is expected that such methods will reduce the amount of labeled data required for training by up to 1000 times. This theory will be used to develop new methods for transfer, multi-task and continual learning, and tools that enable accurate diagnosis of Alzheimer’s Disease. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →