GGrantIndex
← Search

ABI Innovation: Discovering Elements of Extreme and High Conservation in Eukaryotic Genomes

$768,279FY2015BIONSF

Worcester Polytechnic Institute, Worcester MA

Investigators

Abstract

The project seeks to address a fundamental problem of finding regions of eukaryotic genomes that share a remarkable property: when compared between genomes of two and more species, the sequences from these regions appear identical or perhaps with a very small number of errors (mutations). Discovered more than 10 years ago, the origins of this phenomenon and its functional implications for the living organisms remain a mystery. This is not surprising as until now, cataloging all such regions between all species was a computationally unfeasible task, requiring a conceptually different computational approach. In this project, a series of such novel algorithms will be developed that take a full benefit of the kind of biological data being processed and the type of hardware that it is run on. The algorithms will be optimized guided by a biological hypothesis on the distribution of the extreme genomic elements. The algorithms will also be designed to optimally use the internal memory of the computing processors, one of the main bottlenecks of the conventional software. The success of this project will have important implications. It will provide new insights into the eukaryotic evolution and introduce new functional class of genomic elements. Moreover, based on the recent literature and the preliminary data from this project's team, these extreme elements may be implicated in a number of complex genetic disorders in humans. Another important advancement is introducing a new computational paradigm in genomics and bioinformatics of designing algorithms that are biological data- and computing hardware-optimized. The project also proposes a series of interlinked educational activities targeting not only undergraduate and high-school students, but also reaching towards high-school teachers to help them integrate bioinformatics and computational genomics into the high school biology curriculum. By including both, the teacher and student components, the goal is to further broaden the impact by encouraging females and underrepresented minorities to pursue careers in genomics and informatics and involve them in outreach to their parents and high-school peers. The goal of this project is to develop computational methodology for a fast and comprehensive detection of the regions of extreme and high conservation in one and across multiple eukaryotic genomes. The two main classes of genomic elements targeted by this study are long identical multispecies elements (LIMEs) that include but are not limited to UCEs, and near-identical multispecies elements (NIMEs), the highly similar genomic regions that allow only a few mismatches. The project includes three main research aims. Aim 1 is to develop new tools and improve existing tools for genome-wide comprehensive determination of regions of extreme and high conservation, LIMEs and NIMEs. Aim 2 is to apply the developed algorithms to determine a complete atlas of elements of extreme and high conservation in eukaryotes and test biological hypotheses on their evolution, structural organization, and relationship with the genetic variation within species populations. Finally, Aim 3 is to disseminate data on regions of extreme and high conservation and computational tools for their detection. The educational activities will include three components: (1) attract high-school students to the computational undergraduate sciences by educating them about the research in computational genomics and bioinformatics, (2) attract new and retain existing undergraduate students to the Ph.D. program in informatics by involving them in the interdisciplinary research, and (3) provide support for high-school teachers with the implementation of bioinformatics and computational genomics into the high school biology curriculum. The datasets and software tools will be freely available to public at http://korkinlab.org .

View original record on NSF Award Search →