GGrantIndex
← Search

Collaborative Research: Ideas Lab: Discovery of Novel Functional RNA Classes by Computational Integration of Massively-Parallel RBP Binding and Structure Data

$932,563FY2023BIONSF

Harvard University, Cambridge MA

Investigators

Abstract

Advances in genome sequencing have revealed that large parts of mammalian genomes are transcribed from DNA to RNA but are not translated into protein; these are referred to as non-coding RNAs (ncRNAs). Many classical ncRNAs play fundamental roles in biology including translation (tRNAs, rRNAs), splicing (snRNAs), post-transcriptional gene regulation (miRNAs), and many other biological processes. While these known ncRNAs are important for all life, they may be just the tip of the ncRNA iceberg. In fact, we expect that there are many ncRNA classes that remain uncharacterized. These are referred to as the ‘dark matter’ of the genome because we don’t know what biological roles they may play. In parallel, protein studies have determined that thousands of human proteins bind to RNA. Yet it remains unknown how many of these RNA binding proteins (RBPs) interact with ncRNAs, and which specific ncRNAs they might interact with. Our goal is to tackle both problems using very large-scale RNA-protein binding assays combined with computational analysis to uncover new classes of ncRNAs en masse. We will identify specific groups of RNAs that interact strongly with RBPs, develop models that define interaction specificity, and classification systems to predict interactions from sequence and structural data. We will also create a web-accessible database of our findings, allowing anyone to access the data and train undergraduates. We expect to reveal the biological functions of novel ncRNA classes, which will lay the foundation for biotechnology development. Over the past decade, global RNA-centric proteomics methods like crosslinking and immunoprecipitation (CLIP) and related approaches have enabled unprecedented exploration of RNA-protein interactions. These efforts have vastly expanded the number of identified RBPs, with >4,000 human proteins (~20% of the human proteome) currently annotated as “RNA-binding” by UniProt. However, because CLIP approaches can only map a single protein at a time, it is challenging to explore the thousands of annotated RBPs. As a result, consortium efforts like ENCODE are time-consuming and expensive, and have been limited to mapping a fraction of the RBPs in the human proteome. Thus, the creation of a comprehensive RBP-ncRNA interactome is near impossible with current approaches. We will use a newly developed, highly multiplexed approach to generate transcriptome-wide measurements across hundreds of RBPs in a single experiment. We will combine this with cutting edge computational and evolutionary strategies to uncover and classify novel classes of ncRNAs en masse. Our goal is to comprehensively discover and characterize novel classes of ncRNAs in the human transcriptome and assess their phylogeny in a way that is impossible using existing methods. To achieve this goal, we will develop novel experimental methods and integrative computational pipelines that will systematically identify novel classes of ncRNAs by combining both known and novel RNA-protein interactions and uncover clusters of multivalent interactions. We will identify conserved sequence and structural motifs, and evolutionary patterns specific to the novel classes, and develop computational systems to recognize members of the novel classes from these data. This award was the result of an Ideas Lab that was co-sponsored by the four divisions in the NSF Directorate of Biological Sciences. It will be co-funded by the Division of Molecular and Cellular Biosciences, the Division of Environmental Biology, and the Emerging Frontiers program. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →