Penalized mixture cure models for identifying genomic features associated with outcome in acute myeloid leukemia

$259,253R01FY2023LMNIH

Ohio State University, Columbus OH

Investigators

Linked publications & trials

Paper 39529956 Paper 39367245 Paper 38702786 Paper 38547174 Paper 35792553

Abstract

Molecular features associated with time-to-event outcomes, such as overall or disease-free survival, may be prognostically relevant or potential therapeutic targets. Therefore, analyzing data from high-throughput genomic assays with clinical follow-up data has been of growing interest. The Cancer Genome Atlas (TCGA) Project has collected baseline demographic, clinical characteristics, and follow-up data for 11,125 patients for 32 different cancer types and corresponding tissue samples were processed for examining SNPs, copy number, methylation, miRNA expression, and mRNA expression. Because the number of variables (P ) exceeds the sample size (N), one strategy frequently employed when associating molecular features to survivorship data is to ï¬t univariable Cox proportional hazards (PH) models followed by adjustment for multiple hypothesis tests using a false discovery rate approach. However, most chronic conditions and diseases, including cancer, are likely caused by multiple dysregulated genes or mutations. It is therefore critical to ï¬t multivariable models in the presence of a high- dimensional covariate space. Traditional statistical methods cannot be used when the number of features exceeds the sample size (e.g., P > N), though penalized methods perform automatic variable selection and accommodate the P > N scenario. Penalized approaches including LASSO, smoothly clipped absolute deviation (SCAD), adaptive LASSO, and Bayesian LASSO have all been extended to Cox's PH model for handling high-dimensional covariate spaces. However, when modeling survival or other time-to-event outcomes, the Cox PH model assumes that all subjects will experience the event of interest, which is violated when a subset of subjects are cured. Instead, when a subset of subjects in the data are cured, mixture cure models should be ï¬t. Although mixture cure models have been described for traditional settings where the number of samples exceeds the number of covariates, limited variable selection methods and no methods for high-dimensional model ï¬tting currently exist for mixture cure models. Therefore, this project will overcome a critical barrier to progress in this ï¬eld by developing penalized parametric and semi-parametric mixture cure models applicable for high-dimensional datasets. The speciï¬c aims of this application are to: (1) Develop penalized parametric mixture cure models for high-dimensional datasets; and (2) Develop a penalized semi-parametric proportional hazards mixture cure model for high-dimensional datasets. For both aims we will characterize the performance of the methods using extensive simulation studies, develop software, and distribute R packages to CRAN. In aim (3) we will identify molecular features associated with cure and survival using our large unique AML dataset from the Alliance for Clinical Trials in Oncology and assess robustness of ï¬ndings using AML datasets from Gene Expression Omnibus and The Cancer Genome Atlas project. This research will ï¬ll a critical gap as there are currently no mixture cure models for high-dimensional data. We anticipate application of our methods to our AML data will enhance existing risk stratiï¬cation systems used in daily clinical practice that determine treatment intensity and modality.

View original record on NIH RePORTER →