GGrantIndex
← Search

Harnessing Scalable Libraries for Statistical Computing on Modern Architectures and Bringing Statistics to Large Scale Computing

$600,000FY2014MPSNSF

University Of Tennessee Knoxville, Knoxville TN

Investigators

Abstract

This project aims to increase participation in high performance computing (HPC) on medium- to large-scale platforms by the statistics community. Theoretical statisticians potentially have strong contributions to science where big data and HPC are involved, yet in implementation on large platforms they face low-level programming languages, libraries, and runtime environments that pose a high enough barrier to prevent most from entering. This project is centered on enabling exactly this community to experiment at a large scale by bridging most of the barriers while using state-of-the-art approaches from the HPC community. Broader impacts of this research include opening a new avenue for HPC scalable software reuse by the statistics and the data science communities, thus providing additional and more data-oriented feedback to HPC software research. Further, an HPC-engaged statistics community can bring statistical science to modern issues in supercomputing that are increasingly in need of statistical thinking for quantifying uncertainty. The open source R programming language and environment for statistical computing is an ideal vehicle for the project as it currently dominates new work in statistics and it is widely used and rising in popularity in many other data-enabled science communities. This project will connect the R language to highly scalable HPC libraries at interfaces that make long-term sense and in a way that in most cases requires no change from current programming practice. In addition, ease-of-use components will be developed inside R for intuitive use of these libraries for big data input and data manipulation on large computing platforms and to bridge HPC runtime environments. Outreach consisting of documentation, examples, a schedule of tutorials at a number of key conferences, and workshops will be used to bring the results of this project to the statistics and other data-enabled science communities.

View original record on NSF Award Search →