RNA Structural Motif Analysis Revisited: Towards a Systematic Understanding of Noncoding RNA as Modular Biomolecules
University Of Central Florida, Orlando FL
Investigators
Linked publications & trials
Abstract
Project Summary Non-coding RNAs (ncRNAs) play crucial roles in catalysis, gene expression regulation, RNA splicing, and various other biological processes. Their functions are facilitated by their unique 3D structures, which include recurrent substructural components known as RNA structural motifs (RSMs). RSMs are conserved in sequence composition, base interaction patterns, and 3D geometry, serving as fundamental building blocks of RNA structure and function. The dysfunction of RSMs has been linked to various diseases, highlighting the importance of studying these motifs to understand RNA-related diseases and develop potential cures. Our previous NIH- funded research established a strong connection between RSMs and their distinctive base interaction patterns (pairing and stacking). We developed computational methods for detecting and clustering RSMs based on these patterns. This interaction pattern-centric approach is more sensitive, allowing for geometrical variation among functional homologs, and faster, as it bypasses the need to superimpose RSMs. However, predicting base interactions remains imperfect, making it challenging to distinguish between function-preserving covariations and annotation errors. To address this, we will develop a more accurate base interaction annotation framework by reframing the problem as a 3D point cloud object detection problem in computer vision. With improved base interaction patterns, we expect to identify more RSM instances from the growing repository of RNA 3D structures in the Protein Data Bank (PDB). This expanded collection will enable the categorization of RSMs, crucial for defining function-determining characteristics and understanding permissible variations in different residues. We propose a de novo clustering method based on similarities in both base interaction patterns and 3D geometry. Additionally, we will develop algorithms to compare RSM families and explore their organizational architecture - whether hierarchical with superfamilies and subfamilies or flat with near-equally distinct families. Using this knowledge, we will quantify RSM family conservations and variations by constructing probabilistic profiles. These profiles will facilitate the detection of RSM-encoding regions in multiple aligned genomic sequences. Coupled with secondary structure information, we will define a novel representation for RNA structure: the RSM-enriched Secondary Structure (RSMe-SS). We hypothesize that RSMe-SS will be informative for functional inference. To support the hypothesis, we will construct a mathematical model connecting RSMe-SS with molecular functions. If successful, this approach could revolutionize medical genomics by enabling reference-independent functional investigation of novel ncRNA biomarkers and SNPs discovered in high-throughput experiments. In summary, we propose an in-depth study of RSM and extend its applications to ncRNA functional inference in sequence data. This project aims to elucidate fundamental RNA structure-function relationships, significantly impacting basic science. It will also advance medical research by extending the scope of medical genomics beyond proteins to include ncRNAs.
View original record on NIH RePORTER →