III: Small: Computational Methods for Multi-dimensional Data Integration to Improve Phenotype Prediction
The University Of Central Florida Board Of Trustees, Orlando FL
Investigators
Abstract
Multi-omics is the integration and analysis of multiple types of biological data, including genomics, transcriptomics, proteomics, and epigenomics. By combining these diverse omics data, researchers can gain a comprehensive understanding of biological systems at various molecular levels. However, the integration of data from different omics platforms is challenging due to the varying characteristics and quality of the generated data. Another obstacle is deciphering the complex interactions and regulatory networks across different omics layers, along with understanding their temporal dynamics. Additionally, the interpretability of multi-omics models and the translation of their findings into actionable biological insights remain ongoing challenges for successful phenotype prediction using multi-omics approaches. To tackle these research challenges, this project aims to develop a machine learning-based, multi-dimensional, multi-omics data integration system. This system will extract more accurate molecular signatures for biological interpretation and phenotype prediction. The project's outcomes will reduce barriers in analyzing high-dimensional omics profiles and minimize the time and costs typically associated with biological and biomedical research. Furthermore, the project's dissemination and engagement activities will entice minority students to pursue careers in computer science and bioinformatics. This project focuses on integrating omics data from three dimensions: (1) integrating molecular features generated from RNA-seq data, (2) integrating multi-omics data from different high-throughput sequencing technologies with their regulatory interaction networks, and (3) integrating omics and time-lapse imaging data. The primary objective of this project is to develop comprehensive computational methodologies for addressing critical challenges in molecular signature identification and interpretation using multi-omics platforms. To achieve this goal, the project defines three research thrusts: (1) Transcript variants integration, where a biological pathway-encoded transformer will be developed to integrate transcript variants and mRNA expression from RNA-seq samples to identify biological signatures associated with phenotypes. (2) Multi-omics data integration, involving the development of a generative adversarial network model to predict biological interactions between different biological layers in the multi-omics data and impute missing values in the omics profiles. (3) Imaging and omics integration, which aims to develop a deep learning-based framework to integrate multi-omics profiles and time-lapse microscopy imaging data to enhance phenotype prediction. The integrative machine learning models developed in this project can be applied to various computer science applications for integrating high-dimensional and heterogeneous data sources for sample classification. Additionally, in biological research, this work will facilitate data analytics on large-scale multi-omics profiles and imaging data, leading to improved knowledge interpretation and phenotype prediction compared to current biological measures. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →