CiC (SEA): Large Scale Prediction of Transcription Factor Binding Sites for Gene Regulation Using Cloud Computing
University Of North Carolina At Charlotte, Charlotte NC
Investigators
Abstract
With the development of sequencing technologies, over 1,400 bacterial genomes have been sequenced and this number is rising exponentially. To understand these organisms, we first need to identify all the functional sequences in the genomes. Although tremendous advances have been made in computational prediction of gene-coding sequences in genomes, our ability to accurately predict another type of important functional sequences--transcription factor binding sites (TFBSs) is still very limited due to the notorious difficulty of the problem. In this project, researchers will develop accurate algorithms and tools for predicting TFBSs in genomes at a large scale using a graph-theoretic approach. The algorithms will then be parallelized and adapted to the Microsoft Azure cloud platform to predict TFBSs in thousands of sequenced bacterial genomes. Since it remains an unsolved task to efficiently and accurately predict TFBSs in all sequenced bacterial genomes, the algorithms, tools and predictions resulted from this project will greatly aid in deciphering gene regulatory networks in sequenced bacterial genomes. This will directly impact our ability to understand bacteria and utilize them for various purposes, and thus provide the foundations for new strategies for renewable energy production, environment protection, and infectious disease prevention.
View original record on NSF Award Search →