Harmonizing genomic, transcriptomic, and drug response data across pre-clinical models of cancer to support machine learning approaches for personalized cancer therapy selection

$307,975P30FY2023CANIH

University Of Utah, Salt Lake City UT

Investigators

Linked publications & trials

Paper 39679751 Paper 39677644 Paper 39634155 Paper 39627834 Paper 39618547 Paper 39605665 Paper 39605566 Paper 39605564 Paper 39571109 Paper 39564580 Paper 39555666 Paper 39491719 Paper 39484595 Paper 39466880 Paper 39444249 Paper 39416192 Paper 39416136 Paper 39416056 Paper 39402020 Paper 39399030 Paper 39383863 Paper 39356994 Paper 39345541 Paper 39286616 Paper 39277578 Paper 39257981 Paper 39246024 Paper 39229107 Paper 39213022 Paper 39184921 Paper 39149239 Paper 39098861 Paper 39096193 Paper 39091888 Paper 39091725 Paper 39078737 Paper 39074345 Paper 39072997 Paper 39033128 Paper 39030763 Paper 39026767 Paper 39015091 Paper 39006447 Paper 38986036 Paper 38982683 Paper 38979153 Paper 38972082 Paper 38969303 Paper 38948879 Paper 38944657 Paper 38915559 Paper 38915540 Paper 38888911 Paper 38883733 Paper 38870389 Paper 38853908 Paper 38853881 Paper 38823651 Paper 38798617 Paper 38783139 Paper 38780898 Paper 38728375 Paper 38660351 Paper 38651144 Paper 38632889 Paper 38562809 Paper 38559060 Paper 38553486 Paper 38500398 Paper 38496643 Paper 38482972 Paper 38458557 Paper 38441772 Paper 38418709 Paper 38412388 Paper 38386396 Paper 38383051 Paper 38370799 Paper 38370739 Paper 38367737 Paper 38328209 Paper 38326311 Paper 38281912 Paper 38275010 Paper 38270917 Paper 38259095 Paper 38236051 Paper 38233101 Paper 38228681 Paper 38227647 Paper 38206847 Paper 38195207 Paper 38184848 Paper 38157379 Paper 38136396 Paper 38136296 Paper 38133080 Paper 38098449 Paper 38060990 Paper 38055743

Abstract

PROJECT SUMMARY/ABSTRACT This application is being submitted in response to the Notice of Special Interest (NOSI) identified as NOT-OD- 23-082. Machine learning (ML) approaches are showing great promise for predicting therapy responses in large cancer cell-line pharmacogenomic datasets. However, a large gap still exists between predicting drug response in cell-lines and the application of ML algorithms in precision oncology settings i.e. for selecting the therapy most likely to effectively combat an individual patientâs tumor. Currently, effective development of ML algorithms is hindered by the lack of well-defined, uniform datasets so that competing algorithms can be compared and performance improvements can be quantified. Furthermore, the data representations used by existing ML algorithms do not correspond well with the requirements of personalized cancer therapy selection. Just as importantly, drug response prediction performance in cell-line datasets does not necessarily translate to similar performance in patient-relevant cancer models or patients. However, there is a noted lack of publicly available drug response datasets from patient-relevant preclinical models (such as patient-derived xenografts, PDXs; or patient-derived organoids, PDOs), or indeed, directly from patients, to train and evaluate the algorithms. Finally, incorporating metadata elements into ML algorithms e.g. those describing the relationships across cancer types and subtypes, or across the various classes and subclasses of anti-cancer agents can substantially improve drug response prediction performance, but such annotations have not been applied to key datasets. We propose to address these challenges by (1) creating an AI/ML-ready, fully harmonized dataset of genomic, transcriptomic, and drug response data across three distinct cancer models, as well as cancer patients; (2) enriching cancer drug response datasets via cancer type, cancer therapy, and FDA approval status annotations; (3) packaging and sharing the cancer drug response datasets as easily digested data structures to serve as inputs to AI/ML prediction algorithms, partitioned into canonical training and testing subsets; together with scripts for dataset searching and filtering; and (4) demonstrating the AI/ML-readiness of these unified genomic, transcriptomic, and drug response datasets via precision therapy response prediction. This proposal will not only integrate publicly available datasets, but also add unique data from patient-derived models of cancer developed and characterized with funding from the parent project of this application, i.e. the Cancer Center Support Grant (2P30CA042014) awarded to the Huntsman Cancer Institute. Our team combines outstanding cancer biology and cancer model development expertise, computational biology and bioinformatics expertise, and expertise in the development of computational algorithms for predicting cancer therapy responses; and will generate a powerful, AI/ML-ready dataset for improving cancer therapy selection algorithms and their application in precision oncology settings.

View original record on NIH RePORTER →