A Feasible Uniform Standard for Deep Citation of Social Science Data
Harvard University, Cambridge MA
Investigators
Abstract
This Political Science Infrastructure project creates a uniform citation standard for social science data sources. Analogous to the impact of standards for citing textual sources, this facilitates connections between research and researchers. It thereby improves the scientific process and advances the accumulation of knowledge. The use of standard data source citations expands replication efforts and eases these efforts, increasing researcher productivity and allowing more resources to be devoted to new research. Science progresses as scientists create more links with other scientists. Text references are done so often that they are taken for granted for how fundamentally important they have become. In the last 50 years, quantitative work has become roughly half of what journals publish and we there is no comparable citation standard for data. This project builds a simple yet critical piece of infrastructure, a uniform standard for deep citation of data, offers startling and significant savings of time and resources and consequent gains in research productivity. One might conclude that the explosive growth of electronic press and of electronic data dissemination would solve the problem of linkages between cited and source data. The amount and rate of knowledge in circulation certainly has increased dramatically, yet in the absence of uniform standards for citing data the problem is being exacerbated, not resolved. Given the short average duration of URLs on the web, the growing citation of online data sets is a problem, or possibly an opportunity, but definitely not an answer. It is obvious that an online source cited in a manuscript printed out today may note be the same source available at the web address even tomorrow, let alone when the manuscript is published or at some future date. Citations to sources that cannot be retrieved are useless. Deep citation of text means that one text source can unambiguously reference another source or any portion of that source in a manner such that the source can be retrieved by another reader years or decades hence. For books, the author, title, publisher, and page number is enough to retrieve any specific phrase referenced. Deep citation of data means the same, but involves new technological issues, issues that are addressed in this project. Readers need to be able to retrieve the original data set, in the same version, identify the same variables, use the same recodes, and in some instances be able to conduct the same analysis. To facilitate this process, the investigators develop uniform citation standards for social science publishing and create a test-bed or prototype tools for electronic linkage of data citations and source data. Political science as a discipline has been in the forefront historically in the building of social science infrastructure, including creation of the world's largest archive of social science data (the ICPSR), development of the first general-purpose commercial statistical packages (SPSS), and the ongoing data dissemination developments at the Virtual Data Center. Each development originated from within political science, and greatly benefited political science research. But each also represents the discipline's continuing contribution to the infrastructure of scientific research well beyond the disciplines own intellectual boundaries. This project will have implications as broad. The investigators proceed on two fronts, first in the development of a standardized digital signature for cited data, second to create tools to electronically connect and to retrieve data from source data. In order to establish uniform data citation standards and create new tools for electronic linkage of data citations and sources the researchers capitalize on the digital library work already in progress at the Harvard-MIT Data Center for development of a virtual data center (VDC). This project can seamlessly extend the VDC platform at low marginal cost and adds substantial value to the VDC system as a public good in perpetuity. In this project the VDC is extended to an area where it was not originallydesigned but for which it proves to be a very powerful tool.
View original record on NSF Award Search →