XPS: FULL: Collaborative Research: Rethinking Architecture Support for Memory Consistency
Ohio State University, The, Columbus OH
Investigators
Abstract
Despite decades of progress, writing correct parallel software to realize the value of modern parallel computer hardware remains extremely difficult. A key problem is that today's computer systems do not give all programs clear behavioral guarantees; "ill-synchronized" code, in which parallel computations are incompletely or incorrectly coordinated, has ill-defined, often destructive behavior. This problem is a key theoretical and practical flaw in nearly all parallel computer systems. This proposal addresses this challenge, by proposing a new class of parallel computer architectures with strong behavioral guarantees, even for ill-synchronized code. The key idea is to make systems safely terminate ill-synchronized program executions before they can cause problems. To avoid degrading availability, the project includes mechanisms to avoid terminating program executions when possible, by falling back to more permissive, yet safe and predictable behavioral guarantees, and by resolving potential errors caused by ill-synchronized code. The intellectual merits of the project are that it provides crucial behavioral guarantees even to ill-synchronized parallel code. The project eliminates outdated hardware models that not only provide inadequate behavioral guarantees, but are also complex, and power-hungry. The project is the first in this domain to directly address availability and correctness together. The project's broader significance and importance are that it will improve the reliability of all parallel systems, which affects all aspects of life: medicine, energy, transportation, health, defense, and business. The stronger guarantees provided by this project avoid costly, dangerous failures and decrease the cost of application development, even in mature languages. The project will generate results relevant to industry and will influence academia through publication. The project will directly influence secondary and higher education in computing, fostering a diverse, future STEM workforce. To provide strong behavioral guarantees to all code -- even if incorrectly synchronized -- the proposed architectures provide region-atomic memory consistency guarantees for coarse-grained code regions. In these architectures, a program's execution is either a serialization of code regions, or it terminates with an exception that indicates an error could have left memory inconsistent. The architectures provide this strong memory consistency model to all program executions, departing from mainstream approaches to coherence and consistency that favor weaker guarantees without a clear benefit in complexity or performance. In systems executing ill-synchronized code, frequent exceptions may too often terminate program executions, degrading availability. The proposed architectures avoid degrading availability by tolerating consistency violations with a well-defined snapshot isolation semantics that avoids exceptions, but does not guarantee serializability of code regions. The architectures further address availability by resolving exceptions, leveraging commutativity of code to avoid unnecessary exceptions for commutative operations, as well as using dynamic symbolic analysis to resolve exceptions by combining symbolic memory updates.
View original record on NSF Award Search →