ABI Development: Increasing concurrency for improved performance of the BEAGLE library
University Of Maryland, College Park, College Park MD
Investigators
Abstract
Estimating the evolutionary history of organisms, phylogenetic inference, is often a critical step for understanding how organisms adapt in complex biological systems. Modern phylogenetic analyses involve obtaining DNA sequence data from a set of organisms, and using model-based methods to infer a binary tree that reflects how closely the organisms are related to one another. This tree represents the evolutionary history of the organisms going back to their most recent common ancestor and is, in essence, a subset of the overall tree of life. In addition to providing a basic understanding of the evolution of life, these phylogenetic relationships are very important in understanding the evolutionary dynamics, timing, and spread of many disease-causing organisms, such as viruses (e.g., hiv, flu, and Ebola). The most effective phylogenetic inferences involve statistical methods, either maximum likelihood or Bayesian analysis. Both of these methods share the same computational bottleneck, which is the calculation of the likelihood of proposed trees. These likelihood calculations are extremely computationally intensive, and hence accurate phylogenetic analyses become a bottleneck in many studies of the tree of life. Therefore, accelerating phylogenetic analyses is critical to produce timely results that can inform public health and disease containment actions, as well as to understand fundamental problems in evolutionary biology more broadly. This project increases the performance and capabilities of software that will in crease the speed of analyses, and thus decrease the time to scientific results. The Broad-platform Evolutionary Analysis General Likelihood Evaluator (BEAGLE) library and Application Programming Interface (API) is a high-performance likelihood-calculation platform for evolutionary models. It defines a uniform API and includes a collection of efficient implementations for calculating a variety of likelihood-based models on different hardware devices, such as graphics processing units (GPUs) and multicore cpus. The project provides new thinking to the problem of computing the likelihood function in evolutionary analyses through configuring concurrent communication by moving computation that previously required multiple BEAGLE instances into a single instance. Operating under a single beagle instance allows better coordination of concurrent communication, by, for example, reducing memory transfers as well as by load-balancing the computation across potentially heterogeneous devices. The emphasis on concurrency originates from a deep understanding of the specific characteristics of the computational problem - computing the likelihood function - and how it is used for analyses within the domain sciences - phylogenetics and population genetics, and recognizing the opportunities presented by trends in processor design for increasing concurrency. The overarching theme comprises the following recurring sub-themes: i) reformulation - identifying and decomposing computation into practical independent operations; ii) minimization - reducing operations, such as memory transfers and execution overhead; and iii) control - configuring flow and communication to maximize concurrency, including across devices. The research project reformulates the library and its api, and focuses on consolidating more capabilities into a single library instance through the following research initiatives: 1. exploiting additional concurrency within a library instance, thus improving concurrent communication by reformulation and minimization; 2. developing the library to fully leverage multi-device systems, thus improving concurrent communication by controlling load-balancing; 3. exploring numerical precision and scaling in parallel computing context; and 4. extending the capabilities of the library with new models for statistical phylogenetics and population genetics.
View original record on NSF Award Search →