Natural language processing for precision medicine and clinical and consumer health question

$436,325ZIAFY2017LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Paper 32365190 Paper 32065628 Paper 31592532 Paper 29854131 Paper 29409442 Paper 28269901 Paper 28269888

Abstract

The Repository for Informed Decision Making, Clinical Question Answering and Consumer Health Question Answering projects are addressing the above objectives by developing knowledge-based and machine learning approaches to extraction and structuring of information in biomedical literature and other types of text (such as clinical notes and registered clinical trials) for the following types of information: 1) the diseases and conditions; 2) the numbers, co-morbidities, and socio-demographic characteristics of study subjects/participants, such as species, gender, smoking status and alcohol consumption; 3) the therapeutic and diagnostic interventions; 4) the study and publication types; 5) the end-points and the outcomes of the studies; 6) drug interactions and 7) adverse drug reactions. In FY2017, we have developed a number of approaches to facilitate understanding information requests sent to NLM customer services and long queries submitted to MedlinePlus search engine. Information requests sent to customer services are often several paragraphs long and provide the background and context that the customers believe will help understand their needs. For example, customers often describe several generations of their families affected by a disease and ask if their children will have it. The long MedlinePlus queries consist of one or two sentences and are often formed as questions. Both of these request forms are usually ungrammatical and rife with misspellings, abbreviations and informal language. We have developed a spellchecker for consumer language that is performing adequately on the misspellings important to understanding of the needs. After correcting spelling, our system employs three modules: a knowledge-based and a supervised machine learning method to understand the main points of the request, such as the disease or a drug of interest and the type of information about it. The systems extract the main points, which we found are sufficient to automatically search MedlinePlus and find authoritative and relevant pages for 65% of the requests. The third approach is to find similar questions that already have authoritative answers, e.g., provided by NIH institutes. Our clinical question answering system is based on the framework for asking well-formed questions developed by the evidence-based medicine experts. Their analysis showed that presenting a clinical information need as four-part question frame: patient characteristics/problem; planned intervention; comparison; and desired outcome, helps formulate search engine queries that lead to relevant results. We developed methods for automatic extraction of question frames from information requests, automatic query formulation and automatic extraction of answers from retrieval results. The LHC CQA1.0 system extracts the bottom-line advice from biomedical publications and aligns the question frames and the answers to find the best answer. The CQA 1.0 system is currently used to support development of evidence-based care plans at the NIH Clinical Center, to provide bottom-line for retrieved images in the LHC Open-i system and to provide summaries of the biomedical articles in the LHC Open Summarizer.

View original record on NIH RePORTER →