III: Small: Exploiting and Extending Integer Linear Programming in Computational Biology
University Of California-Davis, Davis CA
Investigators
Abstract
Integer Linear Programming (ILP) is a versatile modeling and optimization technique that has been increasingly used in computational biology in inventive ways that differ from its traditional uses. Spectacular improvements in the speed of modern ILP solvers and computers now often allow the solution to important instances of hard computational problems that lack guaranteed-efficient solution methods. This project will further the exploitation of integer programming to solve realistic instances of computational problems in biology. The broader impact of this project will be through the improved computational tools that will be created for use in biology and perhaps medicine. This could have a transformative impact in biology. Broader impact will also be in the training of graduate and undergraduate students, and in outreach to biologists involved in computation. This project will also impact algorithmic computer science, by demonstrating a shift in emphasis from seeking worst-case efficient solutions or approximations to problem instances of all sizes and properties, to seeking methods that are effective in finding exact solutions to realistic problem instances of great importance. There are intellectual challenges in modeling biological phenomena for integer programming, non-trivial technical issues, and educational and outreach efforts needed to make the advances available to a larger biological community. The research questions are of three broad types: 1) Many research questions concern how to formulate, and more efficiently implement, a large succession of related ILP computations. This arises due to the need to evaluate the biological fidelity, and statistical significance, of a solution, often leading to changes in the biological model and ILP formulation, and many successive ILP computations. 2) It is often possible to devise mathematically-equivalent, but very different, ILP formulations for a computational problem, with large differences in the time and memory needed to solve the ILP instances. Many research questions concern how to devise the best ILP formulations for sub-classes of biological problems, and to identify successful new idioms (or modeling insights) that are common in these formulations. 3) Many technical issues come from the use of ILP as a language to express solutions to biological problems that are not naturally posed in terms of linear constraints. This exploits the fact that integer programming, being NP-hard, can express any problem in NP. However, the consequence is that the formulations are often non-intuitive, and huge in comparison to traditional ILP formulations, with a higher ratio of variables to equations than is expected by ILP solvers.
View original record on NSF Award Search →