XPS: Full: FP: Collaborative Research: Sphinx: Combining Data and Instruction Level Parallelism through Demand Driven Execution of Imperative Programs

$575,876FY2015CSENSF

Michigan Technological University, Houghton MI

Investigators

Abstract

Title: XPS: Full: FP: Collaborative Research: Sphinx: Combining Data and Instruction Level Parallelism through Demand Driven Execution of Imperative Programs It has become increasingly difficult to improve the performance of processors so that they can meet the demands of existing and emerging workloads. Recent emphasis has been towards enhancing the performance through the use of multi-core processors and Graphics Processing Units. However, these processors remain difficult to program and inflexible to adapt to dynamic changes in the available parallelism in a given program. Although the computer architecture and programming language community continues to innovate and make important gains towards better programmability and better designs, it remains that parallel programming is inherently costly and error prone, and automatic parallelization of programs is not always feasible or effective. The intellectual merits of this project are the development of a new program execution paradigm and the establishment of critical compiler and micro-architecture mechanisms so that one can design processors that can be easily programmed using existing programming languages and at the same time surpass the performance of existing parallel computers. The project's broader significance and importance are wide-spread: the deployment of such processors will push the limits of computation in every field of science and commerce. The execution paradigm under consideration is a previously unexplored execution model, the demand-driven execution of imperative programs (DDE). The DDE paradigm rests on a solid theoretical framework and promises to efficiently deliver very high-levels of fine-grain parallelism. This parallelism is extracted from a program written in an imperative language such as C, and it is realized by means of an effective compiler-architecture collaboration mechanism using a common, single-assignment form for the program representation. DDE processors can extract instruction-level parallelism much more efficiently than existing superscalar processors as the paradigm does not require dynamic dependency checking. Such processors can fetch, buffer, and execute many more instructions in parallel than current superscalar processors. Owing to its dependence-driven instruction fetching and execution, the paradigm leads to extremely scalable designs, as the communication is naturally localized and synchronization is inherent in the model. Conventional thread-level parallelism (TLP) is orthogonal to DDE, and thus DDE designs can exploit both ILP and TLP. DDE architectures thus represent promising building blocks for extreme-scale machines.

View original record on NSF Award Search →