EAGER: CCF-AF: Combinatorial and Probabilistic Aspects of Biological Problems
Cuny Hunter College, New York NY
Investigators
Abstract
An approach based on combinatorics and probability is proposed to tackle a number of problems in computational biology. This is part of an exploratory research for an EAGER grant that aims at extracting simple and elegant mathematical abstractions rather than focusing on the ?nitty-gritty? details of the biological problem. The rationale is the following: (1) Properties of combinatorial objects lead directly to algorithms for solving the problems that generate them and (2) combinatorial and probabilistic methods provide many analytical tools that can be used for determining the worst-case and expected performance of these algorithms. To establish the validity of the approach, the PIs allow some breadth by considering three exemplar problems. This, however, is consistent with their long term goal to develop mathematical models that lead to viable computational algorithms and that can explain biological behavior at an aggregate level. RNA interaction: An siRNA-based (small interfering RNA) treatment that may ultimately counteract HIV is now not far fetched. An siRNA is a special example of an RNA molecule that interacts with another RNA (e.g. that of HIV), in this case to knockout the HIV gene. Because of its role in gene regulation mechanisms, RNA interaction has a potential to become a new class of drugs. The PIs will conduct a research based on RNA-RNA interaction graphs to predict RNA complexes resulting from the interaction of two RNAs. While this prediction problem is NP-complete, the PIs have proposed novel and efficient approximation algorithms that predict known and unusual RNA complexes in E. Coli. The PIs will improve the running time/approximation capability of the algorithms, apply the algorithms to a wide range of RNA complexes, and study the extension of the algorithms to handle multiple RNAs (not just two) using a combination of RNA-RNA interaction graphs and random search. Protein interaction sites: Similarly, protein interaction is crucial for determining the function of protein complexes. Interaction graphs, however, do not provide a suitable model here because protein interaction is more complex. Instead, predicting the interaction sites of a protein becomes a central task. The PIs will implement a combinatorial approach based on folding the protein on a torus (closed helix) and geometrically grouping amino acids with certain properties (e.g. hydrophobic) to obtain clusters. The clusters represent potential interaction sites. One motivation for this approach is that hydrophobic helices tend to stay away from the solvent and, hence, to interact. Together with a mathematical model of random tori (that will also be developed), this new approach will potentially eliminate the need to predict the 3D folding of the protein (highly intractable problem) and overcomes the simplicity of methods that, otherwise, are entirely sequences based (ignore the geometry). Low complexity sequences LCS: While structure resulting from folding and/or interaction is an important aspect, the lack of structure in proteins raises an important question about the function of their sequences, especially when those sequences are preserved. The cell wall genes of fungi contain an abundance of LCS that are mostly structure-free. Understanding the function of LCS will help guide any effective medical treatment, for instance against uterine infections, that must target the fungus through its cell wall interface. The PIs believe that LCS evolve using a mechanism similar to DNA replication error, resulting in sequences with large deviations in length (a power law distribution). The PIs propose a probabilistic model of evolution that explicitly accounts for lengths and exhibits a similar distribution. This model will help explain the type of evolution that LCS undergo and will shed light on their function in the cell wall. The model will also provide an alternative to alignment-based methods which, despite many existing efforts, usually fail in the presence of LCS. The intellectual merit of the proposal lies in providing essential foundation for a number of combinatorial/probabilistic problems that require a strong knowledge of various fields of mathematics and algorithms, and provide radical ways to capture different aspects of Biology. Therefore, while the research has roots in combinatorial mathematics and probability, it has simultaneously a broader impact on biological sciences. Despite Biology being the driving force, the formulations are general enough and extend the research beyond its biological significance: RNA interaction introduces an interesting geometric graph problem that avoids intersection of edges. Protein interaction sites lead to an elegant problem on regular graphs. Evolution of LCS is captured by a general random walk that is applicable for many systems that exhibits random elongation and shortening over time, e.g. words in a sentence (linguistics). The proposal has also a broader impact on the Hunter College QuBi (Quantitative Biology) initiative where it could provide substantial projects material for the newly developed QuBi courses.
View original record on NSF Award Search →