Text Mining for High-fidelity Curation and Discovery of Gene-drug-phenotype Relationships

$331,290R01FY2017LMNIH

Stanford University, Stanford CA

Investigators

Linked publications & trials

Paper 33486067 Paper 33398279 Paper 32968711 Paper 32873964 Paper 32511288 Paper 31797637 Paper 31797632 Paper 31797619 Paper 31627020 Paper 31114840 Paper 31051039 Paper 30833575 Paper 30223787 Paper 29985971 Paper 29750902 Paper 29635477 Paper 29490008 Paper 29218917 Paper 29218882 Paper 29218869 Paper 29178968 Paper 28975675 Paper 28615003 Paper 28495350 Paper 27797771 Paper 27655861 Paper 27498067 Paper 27115429 Paper 27089514 Paper 26776202 Paper 26776186 Paper 26338771 Paper 26306281 Paper 26226489 Paper 26219079 Paper 26198303 Paper 25800813 Paper 25717414 Paper 25623160 Paper 25601948 Paper 24762971 Paper 24632601 Paper 24577151 Paper 24516403 Paper 24452614 Paper 24345941 Paper 24303232 Paper 24010729 Paper 24004670 Paper 23819846 Paper 23819482 Paper 23414686 Paper 22552919 Paper 22549287 Paper 22422992 Paper 22219723 Paper 22208195 Paper 22174296 Paper 22174289 Paper 21992054 Paper 21828005 Paper 21763417 Paper 21712246 Paper 21676938 Paper 21672905 Paper 21596790 Paper 21521153 Paper 21481770 Paper 21121048 Paper 21047206 Paper 20723615 Paper 20122268 Paper 20003365 Paper 19908383 Paper 19723347 Paper 19604472 Paper 19369935 Paper 18989041 Paper 18831785 Paper 18229697 Paper 18042678 Paper 17586766 Paper 17570144 Paper 16245324 Paper 15290784 Paper 14691217 Paper 14561879 Paper 12888505 Paper 12824318 Paper 12603029 Paper 12488462 Paper 12220483 Paper 12003488 Paper 11694181 Paper 11262955 Paper 11189759 Paper 10984462 Paper 10829296 Paper 10688361

Abstract

? DESCRIPTION (provided by applicant): The rate at which new drugs are being introduced to market is decreasing, with grave implications for human health. Knowledge about the molecular mechanisms relevant to drug response is critical, but is collected in myriad individual experiments. As a result, the published literature contains rich information about how drugs and genes/gene products interact to produce phenotypes at the molecular, cellular and organismal levels, but this textual data requires substantial additional processing. As a result, there are efforts to manually curate the literature, and extract relationships between three key entities: genes/gene products, drugs and phenotypes-with the goal of representing the information in structured, computable formats. Although automated text mining may ultimately replace expert human curators, its best current role is to triage the literature and bring potentially important information to the attention of human curators. Recent advances in computational natural language processing (NLP) generally, and within our laboratory specifically, offer an opportunity to extract relationships between key entities with high accuracy. In particular, we have prototyped methods that take a relatively small set of examples of a relationship of interest (e.g. examples of gene-drug pairs in which the gene product metabolizes the drug) and then and other pairs that share a similar relationship. These methods can be applied to any relationship between our three key entity types. Thus, we propose an ambitious plan to (1) gather large corpora of biomedical text and extend existing lexicons for these entities, (2) build a database of all sentences/paragraphs relating these entities to one another, (3) create methods for accurately extracting semantically precise relationships from all pairs of entity types, and (4) validate these extracted relationships using both available gold standard data experimental sources and expert curator evaluation. In addition to directly supporting curation, our methods and extractions will be made available as general purpose resources for understanding drug action.

View original record on NIH RePORTER →