Statistical and Bioinformatic Methods for Genomic Data
Harvard University, Cambridge MA
Investigators
Linked publications & trials
Abstract
Statistical and bioinformatic methods are proposed to identify regulatory motifs in non-coding DNA sequence using the wealth of sequence and microarray data currently being produced for model organisms. These methods will be generalized to humans, to assist in the understanding of mutations in non-coding DNA associated with disease susceptibility. Specific aims include developing statistical methods for motif discovery, using the combination of sequence and microarray data. The proposed methods find candidate motifs in the regulatory regions of genes most over-expressed in an experimental condition, using a new motif-finding algorithm that uses subsets of sequences that are more abundant with a target motif. Association between the motif occurrence in each gene's regulatory region and the global gene expression pattern is tested to determine significant motifs. Methods are also proposed to identify genes that are co-regulated by a combination of motifs, using regression and analysis of variance techniques. Bayesian hierarchical temporal models for the analysis of microarray data are also proposed to identify genes whose expression changes over time and clusters of genes with similar temporal expression patterns. The proposed methods will be evaluated using data from the model organisms Saccharomyces cerevisiae and Bacillus subtilis. Finally, software will be developed to implement these methods, and will be documented and made publicly available for use by genomics practitioners.
View original record on NIH RePORTER →