Open science platform for cancer dependency prediction and analysis using deep learning and large language models

$237,007R03FY2025CANIH

University Of Pittsburgh At Pittsburgh, Pittsburgh PA

Investigators

Abstract

Summary/Abstract Cancer dependenciesâgenes essential for the survival and proliferation of cancer cellsârepresent critical vulnerabilities that could be leveraged to develop transformative cancer treatments. Advances in machine learning and the availability of large-scale datasets, such as the Cancer Dependency Map (DepMap), have revolutionized the study and prediction of these dependencies; however, significant barriers remain. Biomedical researchers often lack the computational expertise to fully leverage these resources, while computational scientists face challenges integrating diverse datasets and ensuring reproducibility. These obstacles hinder open science efforts and limit the broader impact of cancer dependency research. This proposal responds directly to RFA-OD-24-010, âBuilding Sustainable Software Tools for Open Science,â by addressing these critical barriers through innovative and accessible solutions. Our central hypothesis is that user-centered platforms integrating advanced computational models with open science principles will empower researchers to identify and study cancer dependencies, driving innovation in precision oncology. To address this, our overarching objective is to develop two complementary tools that lower barriers for biomedical and computational researchers and enable collaborative advancements in cancer dependency research. Aim 1 focuses on developing an intuitive, web- based tool that enables biomedical researchers to predict and interpret cancer dependencies without requiring advanced computational skills. Featuring a user-friendly interface and guided by a large language model- powered assistant, this tool will provide a seamless experience for data upload, dependency prediction, and result interpretation. Aim 2 will develop a scalable and modular computational pipeline to support computational scientists in refining and building new prediction models for cancer dependency while integrating diverse datasets. Built on a flexible and reproducible framework, the pipeline will allow customization of workflows to meet specific research needs and support deployment across diverse computational environments. Both tools adhere to the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) to ensure broad adoption and long-term sustainability. By fostering open science practices, including open-source development, comprehensive documentation, and community engagement, our study will bridge critical gaps and promote collaboration across cancer research and computational biology communities. This project is led by a collaborative investigator team with extensive expertise and experience in cancer dependency research, artificial intelligence, bioinformatics tool development, and community engagement and collaboration. Together, these resources will empower biomedical researchers to perform sophisticated analyses and enable computational scientists to refine innovative models. We expect this study to accelerate therapeutic target discovery and advance precision oncology. By addressing key accessibility and scalability challenges, this proposal aligns closely with the NIHâs Strategic Plan for Data Science and will catalyze transformative progress in cancer biology.

View original record on NIH RePORTER →