Maintenance and Development of RepeatMasker

$361,970R01FY2008HGNIH

Institute For Systems Biology, Seattle WA

Investigators

Linked publications & trials

Abstract

[unreadable] DESCRIPTION (provided by applicant): Most eukaryotic genomes include vast numbers of interspersed repeats (IRs), which are the remnants of mostly selfishly amplified transposable elements. Transposable elements have an exceptionally wide- ranging mutagenic effect on genomes, while recognition of IRs provide unparalleled information on genome evolution and is crucial in many aspects of bioinformatics. This grant would continue support for the maintenance and further development of RepeatMasker, a computational tool that has become the de facto standard for identification and characterization of IRs, and support the development of RepeatModeler, a program designed to derive RepeatMasker-grade databases of IR consensus sequences. The source code for these tools are freely available to the public. Development will emphasize the following: a) With the rapid growth of sequenced mammalian species, the building of mammalian repeat libraries has become our highest priority. The RepeatModeler program already excels in its consensus building ability and IR classification scheme, but is still in an early phase and many modules need to be developed. b) RepeatMasker development will initially be focused on the annotation modules. These need to be parallelized and made auditable in order to link annotations to the relevant database entries. We also present strategies to improve RepeatMasker"s detection of ancient, highly fragmented IRs and of IRs in draft genomes, and one that allows it to recognize genomic recombination sites within IRs. c) For many applications of RepeatMasker, including interspecies genome alignments and inference of species phylogenies, knowledge of the age and species distribution of IRs is crucial. We aim to automate and refine the process of "phylogenetic labeling" of consensus sequences in the library. d) We will further develop our website, by adding our transcript prediction program FEAST, increasing the number of pre-analyzed genomes, expanding our new protein based repeat masking services, and optionally presenting data in a graphical form. [unreadable] [unreadable] [unreadable]

View original record on NIH RePORTER →