GGrantIndex
← Search

Developing Evidence-based Data Sharing and Archiving Policies

$498,643FY2019SBENSF

Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI

Investigators

Abstract

Access to original research data supports innovative, interdisciplinary, and integrative research, and enables replication and review of prior work. Consequently, a growing number of funding agencies, journal publishers, and scientific societies now require that original research data must be shared and archived promptly after its collection or publication. However, there are still many unanswered questions about the best way to share and archive research data. For instance: how can data repositories best allocate their limited resources for different aspects of data archiving and processing? What is the most effective way of making data usable by the broadest audience? What data sharing policies most effectively achieve stakeholders? transparency and innovation goals? This project answers these questions by studying the impact of different "curatorial actions" (e.g., standardizing variables, improving documentation) on the reuse of data archived by the Inter-university Consortium for Political and Social Research (ICPSR). As one of the largest social science archives in the world and a leader in digital data curation practice, ICPSR is well-suited as a site for this project. ICPSR is also well-positioned to provide funding agencies and policy makers recommendations for data sharing policies that articulate the metrics needed in evaluating the appropriateness of data sharing and curation plans and their associated costs. This project achieves broader impacts by (1) recommending evidence-based data sharing policies to funders, repository staff,, and researchers and (2) improving research data curation practices. To determine the impact of various curatorial activities on data reuse, the project first defines the different kinds of "curatorial actions" and "impact," and then explains the relationships among actions and impact. To identify curatorial actions and other features of datasets and ICPSR services that influence reuse, the project examines ICPSR's legacy curation logs and use records (such as downloads and citations). Curation logs contain data about specific data transformations or preservation steps. By connecting curation logs to data usage records, the actions are associated with higher rates of reuse or access will be identified. The project examines the utility of two measures of impact--secondary impact and diversity--by comparing use logs to the ICPSR Bibliography of Data-Related Literature. The ICPSR Bibliography links over 80,000 research publications to the ICPSR data on which they are based. "Secondary impact" is a measure of how many times the reuse publications have been cited and is constructed by gathering citation data for all items in the bibliography that are not the original PI's publications. "Diversity" measures the breadth of disciplines that use the data and can similarly be constructed from the bibliography. The project employs multivariate regression analysis and structural equation modeling to determine the relationships among curatorial actions, metadata, the dataset itself, ICPSR services, and reuse and impact. This analysis enables the development of cost models and metrics that allow repository managers to evaluate the return on investment of specific curatorial actions. The project will use these models to inform evidence-based data sharing and archiving policies. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →