GGrantIndex
← Search

Text Mining for High-fidelity Curation and Discovery of Gene-drug-phenotype Relationships

$331,290R01FY2017LMNIH

Stanford University, Stanford CA

Investigators

Linked publications & trials

Abstract

? DESCRIPTION (provided by applicant): The rate at which new drugs are being introduced to market is decreasing, with grave implications for human health. Knowledge about the molecular mechanisms relevant to drug response is critical, but is collected in myriad individual experiments. As a result, the published literature contains rich information about how drugs and genes/gene products interact to produce phenotypes at the molecular, cellular and organismal levels, but this textual data requires substantial additional processing. As a result, there are efforts to manually curate the literature, and extract relationships between three key entities: genes/gene products, drugs and phenotypes-with the goal of representing the information in structured, computable formats. Although automated text mining may ultimately replace expert human curators, its best current role is to triage the literature and bring potentially important information to the attention of human curators. Recent advances in computational natural language processing (NLP) generally, and within our laboratory specifically, offer an opportunity to extract relationships between key entities with high accuracy. In particular, we have prototyped methods that take a relatively small set of examples of a relationship of interest (e.g. examples of gene-drug pairs in which the gene product metabolizes the drug) and then and other pairs that share a similar relationship. These methods can be applied to any relationship between our three key entity types. Thus, we propose an ambitious plan to (1) gather large corpora of biomedical text and extend existing lexicons for these entities, (2) build a database of all sentences/paragraphs relating these entities to one another, (3) create methods for accurately extracting semantically precise relationships from all pairs of entity types, and (4) validate these extracted relationships using both available gold standard data experimental sources and expert curator evaluation. In addition to directly supporting curation, our methods and extractions will be made available as general purpose resources for understanding drug action.

View original record on NIH RePORTER →