III: Small: Combinatorial Optimization Methods for Problems in Molecular Biology and Genetics
International Computer Science Institute, Berkeley CA
Investigators
Abstract
In virtually every area of science, engineering and commerce, optimization problems arise for which no known solution method is guaranteed to yield satisfactory solutions to large instances using a reasonable amount of computation time. Computational complexity theory confirms the intractability of most of these problems. Nevertheless, these problems demand to be solved in practice, and very often heuristic algorithms work quite well in most cases, although not in the worst case. The unreasonable success of heuristic algorithms is one of the great mysteries in computer science. This project will undertake four case studies in the design of heuristic algorithms. 1. Implicit hitting set problems are a class of constraint satisfaction problems in which the constraints are too numerous to list explicitly. Many classic optimization problems fit this model. A generic approach will be investigated which aims to enumerate a small set of constraints sufficient to determine the optimal solution. This approach has proved successful in solving a significant group of genome alignment problems. 2. Integer programming can be used to reconstruct a network of interacting genes and proteins from experimental data. A network is modeled as a wiring diagram of interconnected gates, and integer programming is used to reconstruct the logical functions of the gates. The method has successfully mapped the subnetwork of TLM genes that control telomere length in yeast, and will be applied to other models. 3. Another focus of the research is the discovery of aggregates of interacting proteins that form regulatory modules conserved in evolution. Given such a module in one species, the problem of finding a corresponding module in a second species is abstracted as a graph-theoretic problem called the colorful subgraph problem. The plan is to develop fast and accurate heuristic algorithms for the solution of this problem. 4. The final problem is partitioning a graph into a large number of small dense clusters. An application is given from genetics, involving the automatic inference of the familial relationships among a group of genetically related individuals. The problems studied in this project can serve as test cases for understanding the applicability of fundamental algorithmic strategies such as constraint relaxation, integer programming and local search.
View original record on NSF Award Search →