Crowdsourcing Labels and Explanations to Build More Robust, Explainable AI/ML Activity Models

$305,585R01FY2023AGNIH

Washington State University, Pullman WA

Investigators

Diane Joyce Cookcontact Maureen Schmitter-Edgecombe

Linked publications & trials

Paper 39342453 Paper 37486844 Paper 37457914 Paper 36381500 Paper 36220111 Paper 35822085 Paper 35685277 Paper 35018368 Paper 34776442 Paper 34584490 Paper 34336375 Paper 34017671 Paper 32957853 Paper 32750924 Paper 32189568 Paper 32070942 Paper 31658865 Paper 31443058

Abstract

PROJECT SUMMARY / ABSTRACT As the population of individuals 65+ grows from 58 million to 88 million by 2050, so too will the number of individuals who are aging with Alzheimer's disease and related dementias (ADRDs)1. The parent project introduces clinically-driven technological methods to automate assessment of an older adult's functional health from multimodal sensor data. What is lacking in the community, and in our parent project, is the availability of ground-truth smartwatch activity labels. Without a sufficient amount of labeled data, machine learning models cannot learn robust behavior models and use these models for functional health prediction. Additionally, the categories of activities that have corresponding labels are very skewed, further limiting machine learning performance because of the classical imbalanced class distribution problem. In this supplement request, we propose to dramatically increase the availability of labeled smartwatch data for our parent project and for the field. To do this, we will create a mechanism to crowdsource activity labels through Amazon Mechanical Turk. Additionally, we will capitalize on the crowdsourcing opportunity to push the parent project to the next step by laying a foundation for explainable machine learning models. Once our target number of activity labels is reached, we will initiate a second round of crowdsourcing by asking citizen scientists to create one-sentence explanations of the visualized data corresponding to an activity instance. The supplement project will contain four tasks. First, we will create a visualization and data point-selection tool for use in the Amazon Mechanical Turk (AMT) forum for data collection. A baseline active learning strategy will be used to collect an initial set of labels and create a baseline model, after which, the active learning and annotator selection strategies will be refined to collect the remainder of the activity labels. Finally, diverse data points from each modeled category will be displayed to collect a set of text captions for training explanation models. The outcome of this supplement project will be one of the largest sets of activity labeled smartwatch data collected âin the wild.â The labeled datasets created by this supplement will offer a foundation for a multitude of health studies that can utilize activity information observed by continuous wearable sensor readings collected in real-world studies. For the parent project, the amount of labeled data will increase by over 10,000%. The supplement will also offer a starting point for creating explainable mobile health AI/ML tools.

View original record on NIH RePORTER →