GGrantIndex
← Search

Harmonizing genomic, transcriptomic, and drug response data across pre-clinical models of cancer to support machine learning approaches for personalized cancer therapy selection

$307,975P30FY2023CANIH

University Of Utah, Salt Lake City UT

Investigators

Linked publications & trials

Abstract

PROJECT SUMMARY/ABSTRACT This application is being submitted in response to the Notice of Special Interest (NOSI) identified as NOT-OD- 23-082. Machine learning (ML) approaches are showing great promise for predicting therapy responses in large cancer cell-line pharmacogenomic datasets. However, a large gap still exists between predicting drug response in cell-lines and the application of ML algorithms in precision oncology settings i.e. for selecting the therapy most likely to effectively combat an individual patient’s tumor. Currently, effective development of ML algorithms is hindered by the lack of well-defined, uniform datasets so that competing algorithms can be compared and performance improvements can be quantified. Furthermore, the data representations used by existing ML algorithms do not correspond well with the requirements of personalized cancer therapy selection. Just as importantly, drug response prediction performance in cell-line datasets does not necessarily translate to similar performance in patient-relevant cancer models or patients. However, there is a noted lack of publicly available drug response datasets from patient-relevant preclinical models (such as patient-derived xenografts, PDXs; or patient-derived organoids, PDOs), or indeed, directly from patients, to train and evaluate the algorithms. Finally, incorporating metadata elements into ML algorithms e.g. those describing the relationships across cancer types and subtypes, or across the various classes and subclasses of anti-cancer agents can substantially improve drug response prediction performance, but such annotations have not been applied to key datasets. We propose to address these challenges by (1) creating an AI/ML-ready, fully harmonized dataset of genomic, transcriptomic, and drug response data across three distinct cancer models, as well as cancer patients; (2) enriching cancer drug response datasets via cancer type, cancer therapy, and FDA approval status annotations; (3) packaging and sharing the cancer drug response datasets as easily digested data structures to serve as inputs to AI/ML prediction algorithms, partitioned into canonical training and testing subsets; together with scripts for dataset searching and filtering; and (4) demonstrating the AI/ML-readiness of these unified genomic, transcriptomic, and drug response datasets via precision therapy response prediction. This proposal will not only integrate publicly available datasets, but also add unique data from patient-derived models of cancer developed and characterized with funding from the parent project of this application, i.e. the Cancer Center Support Grant (2P30CA042014) awarded to the Huntsman Cancer Institute. Our team combines outstanding cancer biology and cancer model development expertise, computational biology and bioinformatics expertise, and expertise in the development of computational algorithms for predicting cancer therapy responses; and will generate a powerful, AI/ML-ready dataset for improving cancer therapy selection algorithms and their application in precision oncology settings.

View original record on NIH RePORTER →