CCF: AF: EAGER: Systematic Construction of Heuristic Algorithms for Combinatorial Optimization Problems in Biology
International Computer Science Institute, Berkeley CA
Investigators
Abstract
In many practical situations heuristic algorithms reliably give satisfactory solutions to real-life instances of optimization problems, despite evidence from computational complexity theory that the problems are intractable. The central goal of this project is to contribute to an understanding of this seeming contradiction, and to put the construction and evaluation of heuristic algorithms on a firmer footing. The project will develop a general empirical method for selecting an optimal choice of parameters and subroutines within a well-defined heuristic algorithmic strategy. The method is empirical, and assumes the availability of a supply of typical problem instances which can be used to train the algorithmic strategy and evaluate its performance. The search for optimal parameters and subroutines will be treated as an optimization problem in its own right. The project will introduce the concept of an implicit hitting set problem, demonstrate that a broad class of NP-hard combinatorial optimization problems can be framed as implicit hitting set problems, and systematically derive and evaluate heuristic algorithms for three problems in this class: multi-genome alignment, the feedback vertex set problem, and the feedback arc set problem. Further examples of heuristic algorithm design will be drawn from computational molecular biology, where a long-term goal is to extract, from large data sets, information about how proteins work together to carry out life processes at a cellular level. This investigation will focus on protein-protein interaction (PPI) networks, in which the vertices are the proteins within a species and the edges indicate direct interactions between proteins. The goal will be to discover conserved protein modules: richly interacting sets of proteins whose patterns of interaction are conserved across two or more species. Such modules correspond to subgraphs of a PPI network that have many internal edges, relatively few external edges, and a structure that persists from one species to another.There are several existing software packages for finding conserved protein modules, and the algorithms developed in this project will be compared systematically with these packages. This research leads naturally to fundamental abstract problems in computational graph theory, such as the problem of partitioning a graph into vertex-disjoint subgraphs with a high density of edges, or the colorful subgraph problem, in which each vertex of a graph is assigned a color, and the goal is to find a small connected subgraph containing at least one vertex of each color. The project will systematically derive and evaluate heuristic algorithms for these problems. Since heuristic algorithms are an essential tool for the practical solution of NP-hard optimization problems, and since most optimization problems arising in practice are NP-hard, the attempts to systematize the creation, comparison and validation of heuristic algorithms may have repercussions for many application areas beyond those pursued in this proposal.
View original record on NSF Award Search →