CAREER: Information Extraction and Integration with Applications to Healthcare Question Answering
University Of California-San Diego, La Jolla CA
Investigators
Abstract
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2). Getting a comprehensive answer to a medical question from Web search engines can be time-consuming because health information, as is common on the Web, is scattered across many websites. One difficulty is the prevalent vocabulary mismatch between different sources due to synonymous words, morphological variations, abbreviations, and different word orderings. Another challenge is that, for those who are not medical experts, health information can be complex and difficult to comprehend. Supportive visual representations can be helpful to various people, for example, those reading text not in their first language, older adults, or more generally, non-experts. To address these challenges, this project brings together health information in a single unified place by assimilating, synthesizing, and storing health information in a broad-coverage resource with a shared vocabulary. Such a resource serves the purpose of facilitating fast access to comprehensive answers to health questions to save people time who otherwise might need to spend a substantial amount of time reading different sources to connect the dots and get a complete answer to their information need. To help people understand complex health information, the project will generate summaries that combine text and supportive visualizations. This project will develop novel techniques for integrating information from disparate sources. This entails identifying relevant content and reconciling the mismatch in the vocabularies of different sources. To enforce a shared vocabulary across sources, the project will develop novel techniques for entity linking, that are not limited to recognizing entities seen at training time, as new diseases, treatments, other types of medical entities can emerge. For broad coverage, the project will consider content written by clinicians, researchers, and consumers. The project will convert this information into a graph structure that can be used to learn representations that further enhance coverage of the resource while maintaining high precision. The project will also develop novel techniques for multimodal summaries of healthcare answers to facilitate understanding of complex concepts. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →