Natural Language Question Understanding for Electronic Health Records
University Of Texas Hlth Sci Ctr Houston, Houston TX
Investigators
Linked publications & trials
Abstract
? DESCRIPTION (provided by applicant): Patient information in the electronic health record (EHR) such as lab results, medications, and past medical history is the basis for physician decisions about patient care. It also helps patients better understand and manage their care. Efficient access to this patient information is thus essential. One of the most intuitive ways of accessing data is by asking natural language questions. A significant amount of work in medical question answering has been conducted, yet little work has been performed in question answering for EHRs. Natural language questions can be represented in logical forms, a standard structured knowledge representation technique. This project proposes to take natural language EHR questions, both for doctors and patients, and automatically convert them to a logical form. The logical forms can then be converted to a structured query such as those used by EHRs. A major obstacle to this approach is the lack of data containing questions annotated with logical forms. This project hypothesizes that a small set of questions can be manually annotated, and then paraphrases can be produced for each annotated question. Since paraphrasing is a simpler task than logical form annotation, crowd-sourcing techniques can be used to collect thousands of question paraphrases. This question paraphrase corpus will then be used to build a semantic grammar capable of recognizing the logical structure of EHR questions. To ensure a robust, generalizable grammar, existing NLP techniques will be used to pre-process questions, simplifying their syntactic structure and abstracting their medical concepts. In order to develop such a method, the candidate, Dr. Kirk Roberts, requires additional training and mentoring in natural language processing and biomedical informatics. This application for the NIH Pathway to Independence Award (K99/R00) describes a career development plan that will allow Dr. Roberts to achieve the goals of this project as well as transition to a career as an independent researcher. He will be mentored by Dr. Dina Demner-Fushman, a leading medical NLP researcher, and co-mentored by Dr. Clement McDonald, a leading EHR and medical informatics researcher. The specific aims of the project are: (1) Build a paraphrase collection of EHR questions, where each prototype question will have many unique paraphrases. The paraphrases encompass different lexical and syntactic means of conveying the same logical form. (2) Construct a semantic grammar for EHR questions. The grammar can then be used to convert a natural language question to a logical form. (3) Implement an end- to-end question analyzer that generalizes EHR questions for improved parsing, parses the question into a logical form using the grammar, and converts the logical form into a leading structured EHR query format.
View original record on NIH RePORTER →