RI: Small: Taming Massive Pre-trained Models under Label Scarcity via an Optimization Lens
Georgia Tech Research Corporation, Atlanta GA
Investigators
Abstract
Deep transfer learning (DTL) has made significant progress in many real-world applications such as image and speech recognition. Training deep learning models in these applications often requires large amounts of labeled data, (e.g., images with annotated objects). Labelling these data by human labor, however, can be very expensive and time-consuming, which significantly limits the broader adoption of deep learning. Such an issue is more pronounced in certain domains (e.g. biomedical domain), where labeled data are scarce. To address the concern of label scarcity, researchers have resorted to deep transfer learning, where a massive deep learning model is first pre-trained only using unlabeled data and then adapted to the downstream task of our interests with only limited labelled data. Due to the gap between the enormous sizes of the pre-trained models and the limited labeled data, however, such a deep transfer learning approach is prone to overfitting and fail to generalize well on the unseen data, especially when there are noisy labels. Moreover, the enormous model sizes make practical deployment very difficult when there are constraints on storage/memory usage, inference latency and energy consumption, especially on edge devices. This project aims to develop an efficient computational framework to improve the generalization of deep transfer learning and reduce the model sizes by leveraging cutting-edge optimization and machine learning techniques. Specifically, this project aims to develop: (I) new adversarial regularization methods, which can regularize the complexity of deep learning models and prevent overfitting of the training data, (II) new self-training methods robust to noisy labels in the training data, and (III) new optimization methods, which can improve the training of compact deep learning models in deep transfer learning. Moreover, we will develop new generalization and approximation theories for understanding the benefits of our proposed methods in transfer learning. The proposed research will also deliver open-source software in the form of easy-to-use libraries, which facilitate researchers and practitioners to apply DTL in related fields. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →