Collaborative Research: Experimental and Computational Studies of DNA Binding by Human Paralogous Transcription Factors
Duke University, Durham NC
Investigators
Abstract
All cells in an individual contain the same genetic information. However, different cells use this information differently, and each cell expresses only a small fraction of genes to produce the corresponding proteins. This process is tightly regulated by specialized proteins called transcription factors, which bind DNA in the neighborhood of specific genes and influence their expression. This project studies human transcription factors to understand how they identify their specific DNA targets across the genome. The project focuses on human transcription factors that have similar structures but interact with different genomic regions in the cell, and thus perform different functions. The goal of this research is to understand how closely related factors are able to recognize distinct genomic sites, a question that cannot be thoroughly addressed using current DNA binding specificity models. During the course of the project, high-quality transcription factor-DNA binding data and DNA binding specificity models will be generated. The data and models will be made available to the scientific community. The data generated in this study is expected to become a valuable resource for future development and testing of protein-DNA binding models, and for studies of differences among related transcription factors. Graduate, undergraduate, and high school students will participate in data generation and analysis, as well as dissemination of the results using visualization software, web platforms, and scientific posters and articles. Thus, students of various age groups will be introduced to molecular and structural biology through practical studies of protein-DNA interactions. In addition, students will become contributors to a bioinformatics research database and will have the opportunity to co-author scientific publications early in their career. Characterizing protein-DNA interactions and understanding the role of both proteins and DNA is vital to interpreting regulatory elements in the genome, and to understanding how changes in the protein or the DNA binding sites will affect cell function. Focusing on 16 transcription factors (TFs) from six different protein families, the goal of this project is to understand how paralogous TFs from each family are able to recognize distinct genomic sites, a question that cannot be addressed using current data and models. The project will use a combined experimental and computational approach to determine how DNA sequence and shape contribute to differential DNA binding by paralogous TFs. First, the project will use carefully designed high-throughput assays to measure in vitro binding of related TFs to thousands of putative genomic binding sites. These assays, called genomic-context protein-binding microarrays (gcPBM), minimize the noise and bias in the experimental measurements, making the data ideal for comparing the intrinsic sequence preferences of closely related factors. These high quality data will then be used to characterize the DNA binding preferences of paralogous TFs using regression models based on the DNA sequence content of putative binding sites. Next, the project will investigate the contribution of DNA shape, compared to high order DNA sequence features, to differential DNA binding specificities of paralogous TFs. Finally, the new computational models of DNA binding specificity will be validated using two approaches: 1) the models will be tested in vitro by introducing mutations in the DNA binding sites and testing them in new gcPBM assays; and 2) the models will be validated against in vivo TF binding data to verify that the new models are able to explain, at least in part, the differential in vivo binding patterns of paralogous TFs. Through the identification of characteristics that contribute to differential DNA binding of closely related TFs, this project represents a significant step forward in understanding how these factors are able to select different genomic binding sites, despite sharing a common DNA binding domain. This will ultimately lead to a better understanding of how TFs have evolved to regulate different target genes and perform different functions in the cell. This award is co-funded by the Directorate of Biological Sciences Division of Emerging Frontiers and Division of Molecular and Cellular Biosciences Program in Genetic Mechanisms and by the Directorate of Mathematical and Physical Sciences Division of Mathematical Sciences Program in Mathematical Biology.
View original record on NSF Award Search →