Collaborative Research: RI: Small: Foundations of Few-Round Active Learning

$300,000FY2023CSENSF

Virginia Polytechnic Institute And State University, Blacksburg VA

Investigators

Abstract

Supervised machine learning has found widespread application, often achieving state-of-the-art performance. However, these algorithms rely on labeled training instances, which can be challenging to acquire. Labeled instances are often done by humans and require time and money to obtain. Active Learning strives to minimize labeling costs by identifying the most informative instances for annotation. While Active Learning techniques have shown promise in producing high-performance models with fewer labels, their applications remain constrained due to the necessity for multiple interaction rounds with annotators, which can be time-consuming or infeasible. This project aims to advance Active Learning algorithms and understanding of their fundamental capabilities in scenarios with limited interaction rounds. A broad spectrum of machine learning applications is expected to benefit from the results of this research, reducing the time and cost associated with obtaining sufficient data for training accurate models. Additionally, this project engages underrepresented minority students through hands-on research and learning activities, develops course modules on resource-efficient machine learning, and disseminates our findings to industry and academia via an extensive online Active Learning tutorial. This project will launch a comprehensive investigation of few-round active learning, where the learner can actively request feedback on specific data points within a limited number of rounds. To achieve this, the project will interleave two algorithmic tasks: robust data utility quantification and planning with limited adaptivity. First, the investigators will explore methods to measure the utility of unlabeled data, taking into account data size, underlying data characteristics, and downstream learning tasks. Subsequently, the team will develop algorithms that optimize the data utility metric while simultaneously improving the metric's quality over time in a few-round active learning setting. The project findings will establish principled approaches for addressing a novel exploration-exploitation dilemma specific to few-round active learning and provide a fundamental understanding of adaptivity's role in budgeted learning. Finally, the project will evaluate the proposed approaches across various high-impact machine learning applications, including autonomous driving, smart buildings, dialog systems, and biochemical engineering. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →