GGrantIndex
← Search

EAGER: Using Search Engines to Track Impact of Unsung Heroes of Big Data Revolution, Data Creators

$154,535FY2018CSENSF

University Of California-Riverside, Riverside CA

Investigators

Abstract

Efficient mechanisms of data exchange are increasingly central to science (and society) in the midst of the deluge of complex data. The pace of data production and its complexity mean that a large amount of data is often not adequately analyzed by the data producers, and thanks to its rapid dissemination, the broad community is participating in its analysis. This model, however, carries a risk of neglecting the input of the original data creators, as attention shifts to data integrators and analyzers. The current paradigm of information dissemination and assigning credit in science, based on peer-reviewed publications and formal acknowledgment of third-party contributions in the form of citations, is biased toward high-profile, well-known scientists and research centers who participate in the latter stages of knowledge creation and dissemination. We propose to use unbiased internet searches to identify uncredited use of datasets and resources in research literature, allowing data creators and researchers participating in early stages of data creation to claim credit for their work. Using machine-learning and text-mining techniques, the PI seeks to extract relevant information from noisy results of general-purpose search engines and develop easy-to-use interfaces for public use of such resources to supplement official bibliometric resources. Acceptance and citation biases have a significant impact on careers of researchers outside the central foci of funding and publications, which are also typically places with more-diverse research forces. First, the probability of rejection in peer review is significantly biased against less-famous scientists and those at less-research-intensive institutions. These biases are less likely to affect data creation as databases typically accept data without peer review and the value of data can be measured by its use. The same biases affect number of citations, where people tend to cite more-famous, established scientists, or cite reviews that are often invited and only leading scientists would have been invited to write the review. As a result, both publications and citations are heavily biased toward already-recognized scientists that represent less-diverse populations, both in personal terms and in terms of the institutions where they work, as compared to the general population of scientists. Such biases affect careers and ability to obtaining grant funding for young scientist operating outside of the tight collaboration networks at best research institutions and creates a classical rich getting richer and poor getting poorer loop. The new internet-based information exchange paradigm has already had a profound effect on researchers? ability to getting their results in prominent, public view. This proposal aims at approaches that would further alleviate publication and citation bias, not by addressing it directly, but by developing more openways of evaluating contributions to science.

View original record on NSF Award Search →