Data-informed Modeling for DNA and RNA Aptamer Design
Arizona State University, Scottsdale AZ
Investigators
Abstract
Petr Sulc of Arizona State University is supported by an award from the Chemical Theory, Models and Computational Methods program in the Division of Chemistry to develop new data-driven methods to design new RNA and DNA binders to molecular targets. Molecular interactions are at the basis of function of all living organisms, and their understanding is crucial for for diagnostic and therapeutics. Dr Sulc will develop machine learning models to analyze sequences of molecules that bind to a certain target molecule of interest (such as surface of a virus). The models extract particular structural or sequence motif in the molecule that is crucial for its function, which allows to computationally design even stronger binders. Dr Sulc’s group will train and validate the methods on both naturally occurring molecules as well as results from selection experiments against different targets (including viral surface proteins) with possible applications in diagnostics, therapeutics, as well as basic understanding of molecular interactions. Dr Sulc will further develop outreach programs that include public lectures and online activities aimed at high school students and general public to broaden participation in science and develop interdisciplinary skills that combine computer modeling, simulations and biochemistry experiments. This project will develop new machine-learning methods for processing of sequence ensembles from selection experiments. The experimental selection protocols (such as SELEX) serve to obtain DNA or RNA sequences that bind to a target of interest (e.g. protein, small molecule, or cells from a particular tissue ), where in each round a subset of the random sequence library that binds strongly to the target is amplified and kept for the next round of selection. Such methods produce large numbers of sequences, most of them only weakly binding to the target of interest, with few strongly binding candidates emerging at the end of the procedure. This project will develop novel models derived from Restricted Boltzmann Machine architectures and uses them both use as classifiers as well as generators of novel binders. Additionally, the models can be used to infer sequence and structural motifs in aptamers that are the key elements for strong affinity with the molecular target, making the models also interpretable. The models will be trained on naturally occurring non-coding RNAs, as well as multiple experimentally generated ensembles and the novel generated sequences will then be verified in experiments. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →