TLS: Assessing and Predicting Scientific Progress through Computational Language Understanding

$475,700FY2009SBENSF

University Of Chicago, Chicago IL

Investigators

James A Evanscontact Andrey Rzhetsky Ian Foster

Abstract

This project provides new approaches to the evaluation of scientific and technological promise. The basic insight is that scientific concepts, like organisms within ecologies, only exist in networks of supporting ideas, and this is key to understanding the way in which scientific concepts are adopted and diffused. The particular discipline that is studied is chemistry and related disciplines. The work has three stages. First, it assesses and predicts innovation in science from the novelty and popularity of terms and statements within a scientific network. Second, it assesses and predicts the integration of scientific knowledge from term and statement linkage, repetition, and elaboration patterns. Finally, it assesses and predicts success along the path from science to technology by linking term and statement connections to problems. The research uses cybertools to develop a very large database of scientific terms and statements across a broad corpus of published research and invention, including news, blogs, and other informal text as well as unpublished opinions. Intellectual Merit Scientific evaluation, from awarding grants to reviewing tenure, has historically relied on quantity to proxy for quality. Progress is inferred from the amount of research produced or the sum of attention garnered. Numbers of books, articles, pages, citations and media mentions are tallied. These quantities are inexpensive to measure, but fail to directly capture whether a contribution is important. This project advances the measurement of scientific achievement by placing scientific claims in the context of past science. It does this by building on recent advances in computational language understanding, and the electronic availability of science. In particular, the project extracts scientific term and statements from a broad collection of published articles, patents and blogs in disciplines related to chemistry. These statements are supplemented with information about their social context -- their location in the network of authors and the geographical sprawl of global research institutions. Models are then developed that exploit patterns in the structure of scientific language to assess the importance of scientific programs and fields. The degree of innovation in science is assessed from the novelty and popularity of terms and statements within the broader network. The integration of scientific knowledge is assessed by examining the term and statement linkage, as well as repetition and elaboration patterns. These are, in turn, used to predict the path from science to Technology. The project also develops new methods for managing and processing large quantities of text and network data. Broader Impacts: The project develops general methods relevant for policy makers and scientists. This research generates, for example, high resolution, dynamic maps of knowledge claims in chemistry and neighboring disciplines such as pharmaceuticals and toxicology. The interactive nature of the maps means that they can serve as a teaching tool to help students understand scientific trends in their corner of science. They can also facilitate precise analysis of the production of science and stimulate the production of new hypotheses, as researchers note statements not made within the network of claims. When these maps are combined with the scientific models, they hold the potential to revolutionize the way scientists collaborate, identify research problems and validate hypotheses. Finally, because the research both clarifies what is published and where, as well as traces the careers of scientists and inventors,the research generates insights into what factors channel scientific attention, and how these factors can be harnessed to guide the most powerful public investments in innovation.

View original record on NSF Award Search →