CRII: CIF: Crowdsourcing-aware Learning
Carnegie Mellon University, Pittsburgh PA
Investigators
Abstract
Machine learning has significantly advanced the state of the art in a variety of applications. These successes have required massive labeled datasets for training machine learning algorithms. The collection of these labeled datasets usually involves human annotation. For instance, the training labels for supervised learning algorithms are often obtained through "crowdsourcing" where people label the data over the Internet in exchange for monetary incentives. Most learning algorithms, however, are agnostic of this human-labeling process. This project designs improved learning algorithms by incorporating the "human" aspect of the data collection process in the machine learning objective. In more detail, this project considers supervised binary classification tasks where the labels for the training data are obtained from people. The research involves design of learning algorithms that jointly consider the human collection process -- including the interfaces and incentives available to the human labelers -- and the overall learning objective. Theoretical guarantees of optimality are derived and compared with guarantees for algorithms which are agnostic of the human component. The algorithms and guarantees are based on models of human behavior from psychology, such as permutation-based models, that allow for maximal accuracy while making minimal assumptions on how the human labelers behave. The theoretical results are corroborated with practical implementations (open sourced) and real-world experiments (data freely available online). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →