GGrantIndex
← Search

CAREER: Advancing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced Data Management

$302,530FY2017CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Abstract

Machine learning on big data is finally having an impact on our daily lives, from small triumphs like Siri and Google Translate to much tougher emerging applications like driverless cars and computer-assisted medical image diagnosis. From mundane online fraud detection to the most sophisticated uses of computer vision, these applications share an insatiable appetite for massive labeled training data. The primary source of high-quality labels is crowdsourcing, and research to date on crowdsourcing has focused on the key problem of how to maximize the production of high-quality crowdsourced labels per dollar spent, for problems where workers must choose between just a few predefined labels. However, more open-ended labeling problems have grown to constitute almost half of crowdsourced tasks today, and open-ended tasks raise an entirely new set of research challenges for crowdsourced data management. This activity addresses the key new research challenges in managing and optimizing open-ended crowdsourcing. Since open-ended crowdsourcing employs tasks with a large number of alternatives, humans struggle to select error-free ones. Additional challenges emerge in determining the open-ended task types appropriate for a specific problem, developing schemes to ascertain the right answer given open-ended worker responses, and inferring the hidden perspectives behind worker answers. The activity targets open-ended crowdsourcing problems that span nearly 90% of those used in practice today, with wide applicability in computer vision, natural language processing, and machine learning in general. The technical outcomes of the activity include the first foundational principles for open-ended crowdsourced data management, which in turn will expand the reach of machine learning into new and more challenging domains and more effective solutions in existing applications that impact our everyday lives. The pedagogical outcomes of the activity include a course on human-in-the-loop data analytics, crowdsourcing education modules for school teachers, as well as a quantification and dissemination of how crowdsourcing is performed in practice, along with a benchmark to accelerate crowdsourcing research in the future.

View original record on NSF Award Search →