GGrantIndex
← Search

CAREER: Dependable High Performance Scientific Computing at Extreme Scale via Algorithmic Fault Tolerance

$454,497FY2012CSENSF

University Of California-Riverside, Riverside CA

Investigators

Abstract

Extreme scale high-end computing platforms are expected to be available before 2020 and will have 100 million to 1 billion CPU cores. Due to the large number of components in these platforms, the probability that errors occur during the execution of an extreme scale application is expected to be much higher than observed today. The goal of this CAREER research project is to develop highly efficient techniques to detect, locate, and correct both soft and hard errors according to the specific characteristics of an algorithm. The target algorithms include (1) Krylov subspace methods for solving sparse linear systems and eigenvalue problems; (2) Direct methods for solving dense linear systems and eigenvalue problems; and (3) Newton's method for solving systems of non-linear equations. This project will create significant education outcomes by integrating the following four components: (1) establishing a supercomputing research laboratory to support senior design projects and REU, enhance graduate education and research, and demonstrate highly dependable applications on high-end computing platforms; (2) enriching the teaching of both undergraduate and graduate courses by integrating fault tolerance and high performance computing into the courses; (3) increasing minority students involvement by encouraging minority students to pursue graduate degrees in computing; and (4) offering free workshops to K-12 teachers and students.

View original record on NSF Award Search →