Chronic lung disease phenotyping and genomics in the Veterans Health Administration
Va Boston Health Care System, Boston MA
Investigators
Abstract
Chronic lung diseases (CLDs), including asthma, chronic obstructive pulmonary disease (COPD), and interstitial lung disease (ILD), cause significant functional limitation and disability and were, collectively, the 4th leading cause of death in the United States in 2019. Veterans are enriched for environmental exposures which contribute to the pathogenesis of CLDs (e.g., smoking, environmental toxins) and have a higher burden of CLD relative to the general population. However, significant heterogeneity in CLD development and progression exists. Investigations into the genomic contributions towards CLD susceptibility within the Veterans Health Administration (VHA) have been limited by the lack of robust CLD phenotypes. Key challenges in [clinically- based] CLD phenotyping include the lack of blood-based biomarkers, variable implementation of diagnostic testing (e.g., underutilization of spirometry), and inconsistent availability of testing results in the VA electronic health record (EHR). These barriers result in frequent misclassification and present challenges in identifying controls as well as cases for large-scale epidemiological and genomic studies. To address these knowledge gaps, we propose a multi-faceted approach to CLD phenotyping and validation, followed by genome-wide association studies (GWAS), construction of polygenic risk scores, and examinations of gene-by-environment interactions including pharmacogenomics within the Million Veteran Program (MVP). First, we have developed a novel natural language processing (NLP)-boosted EHR-based phenotyping algorithm which will generate quantitative probabilities for the presence and absence of multiple CLDs (COPD, emphysema subtype, asthma, interstitial lung abnormalities (ILA), fibrosis subtype) for all VHA users (~16.8 million) through 2018. Following algorithm optimization and internal validation against 500 gold standard charts (already adjudicated in duplicate), prospective validation using mortality and respiratory-related healthcare utilization data collected after 2018 will be performed (Aim 1). Second, we propose independent phenotyping through quantitative imaging analysis (QIA) of chest computed tomography (CT) data available in a subset of participants enrolled in MVP (Aim 2). In preliminary work, a secure prototype pipeline behind the VA firewall for the analysis of archived clinical chest CT data using VA Technical Reference Manual (TRM)-approved software platforms (3D Slicer, Chest Imaging Platform13,14) has been established. For the current proposal, through a national network of collaborating VA pulmonary investigators (J. Curtis, VISN10; C. Wendt VISN23; V. Fan VISN20; C. Wells VISN7; F. Kheradmand, VISN16), full-resolution clinical chest CT data from a subgroup of individuals enrolled in MVP (n=6,000-9,000) will be analyzed to generate objective, quantitative measurements of parenchymal lung disease (e.g., percent emphysema and ILA). Each of these QIA-based phenotypes will be (i) validated against prospective mortality and healthcare utilization outcomes (available from Aim 1) (ii) utilized as an independent secondary validation of EHR algorithm-derived emphysema, ILA, and fibrosis (from Aim 1), and (iii) as a distinct, quantitative phenotype in genomic investigations (Aim 3). Third, we propose to (i) conduct genome-wide association studies (GWAS) within the MVP using EHR-algorithm assigned CLD case-control status (Aim 1) and QIA-phenotypes (Aim 2), (ii) construct polygenic risk scores (PRS) for each CLD, and (iii) examine gene-by-environment [and pharmacogenomic] interactions in CLDs (Aim 3). Findings will be replicated in independent cohorts: International COPD Genetics Consortium (ICGC) and UK Biobank (COPD, emphysema), Genetic Epidemiology Research in Adult Health (GERA) and Mass General Brigham Biobank (asthma), and an international cohort investigating idiopathic pulmonary fibrosis (ILA, fibrosis). Upon successful completion of the project, we will have developed and validated robust phenotypes and identified risk loci for CLDs which will shed light on disease mechanisms and have the potential to serve as the foundation of personalized medicine initiatives for CLD management within the VHA.
View original record on NIH RePORTER →