GGrantIndex
← Search

A massive study of data science to address the scientific reproducibility crisis

$364,500R01FY2019GMNIH

Johns Hopkins University, Baltimore MD

Investigators

Linked publications & trials

Abstract

? DESCRIPTION (provided by applicant): There is a crisis of reproducibility and replicability of scienti?c results. This crisis is an increasing source of concern both in the scienti?c and poplar press. The crisis is so acute that the United States Congress is currently investigating reproducibility of the scienti?c process. At the heart of the crisis is a shortage of data analytc skill throughout the scienti?c enterprise. There is an emerging consensus that the best way to address the crisis is to increase data analytic training, particularly around reproducibility and replicability. In this application we (1) propose the ?rst formal statistical model for reproduciility and replicability and then use data and experiments from the largest massive online open program in data science in the world to (2) perform randomized studies to improve our knowledge about which statistical methods and protocols lead to increased reproducibility and replicability in the hands of average users and (3) to analyze learner, course, and content characteristics that increase learner success and throughput to increase the number of trained data analysts worldwide. To accomplish goals (2) and (3) we will use the largest and highest throughput data science program in the world: the Johns Hopkins Data Science Specialization. This specialization, developed by the investigators of this project, consists of nine courses that are offered every month. Since the launch of this program in April 2014, these classes have seen more than two million enrollments and nearly all their experiences have been recorded as data. Furthermore, the MOOC platform for this series permits random assignment of quiz questions and content. We will disseminate our results through open source software, analysis protocols, our popular blog, and the Data Science Specialization to maximally improve data science training and reduce the scienti?c replication and reproducibility problem. The size of ths program means that by increasing quality of the program and the number of completing students by even a small percentage we can affect global data analytic behavior.

View original record on NIH RePORTER →
A massive study of data science to address the scientific reproducibility crisis · GrantIndex