Cancer Center Support Grant
University Of California Los Angeles, Los Angeles CA
Investigators
Linked publications & trials
Abstract
ABSTRACT This application is being submitted in response to "Administrative supplements for research leveraging novel data science approaches to address integration of modifiable risk factors on cancer outcomes". Lung cancer remains the leading cause of cancer-related deaths globally. The Global Burden of Disease 2019 study attributes 62% of lung cancer deaths to smoking, 15% to PM2.5 air pollution, 5.8% to secondhand smoke, and 4% to radon, with additional links to occupational and dietary factors. Existing risk models like PLCOm2012, Bach, and LCRAT incorporate clinical variables such as age, race, BMI, smoking history, and personal/family history of lung cancer. Recent deep learning models, such as Sybil, utilize high-dimensional features from low-dose CT scans and outperform clinical models in predicting up to 6-year risk. However, these models often consider either clinical or imaging-derived features, not both, and typically use linear models like logistic regression, which fail to capture the complex interplay between risk factors. We propose developing and validating a novel data-driven approach to estimate lung cancer risk by integrating clinical, self-reported, and imaging-derived features, incor- porating static and longitudinal information about modifiable risk factors. Our objective is to accurately estimate lung cancer risk changes due to multiple modifiable factors: smoking status, smoking intensity, BMI, and envi- ronmental exposures (e.g., radon, asbestos) from self-reported questionnaires. We hypothesize that using a recurrent neural network to represent the interplay between high-dimensional observable features and changes in risk factors can yield more accurate predictions, further tailoring screening inclusion criteria. Our aims are: (1) develop and validate a recurrent neural network integrating clinical and imaging-derived features using NLST data to predict 6-year lung cancer risk, and (2) validate the risk model using UCLA data and disseminate it to the research community. The data for our proposed model are collected within a 24-hour window, typically reported in questionnaires at the screening exam. The expected outcome will be a data-driven model that jointly incorpo- rates longitudinal changes in smoking status, smoking intensity, body mass index, and environmental factors to predict how intervening on these modifiable risk factors can reduce an individualâs risk of lung cancer.
View original record on NIH RePORTER →