GGrantIndex
← Search

CCRI: ENS: Collaborative Research: Enhancing R for Scalability and Deployment

$1,799,705FY2019CSENSF

Northeastern University, Boston MA

Investigators

Abstract

Data analytics software systems are integral to the fabric of science. The ability to acquire, process, and analyze complex data is at the core of disciplines that range from high-energy physics, astronomy, chemistry, and biology. While scientists can exploit repositories of tools optimized and refined over the years, significant new challenges are posed by the rapidly evolving characteristics of modern scientific datasets. New analyzes rely on interactive computer programming languages that are open source and that allow rapid, interactive, data exploration. This project aims to strengthen and enhance the R software infrastructure. R is the language and environment with the largest collection of free and reusable data analytics software packages. This project will help R adapt to the evolving needs of researchers in diverse fields. The technical enhancements, embodied by new and updated software components, will open opportunities for researchers to contribute to the community infrastructure and enable a number of new research directions building on R as a platform. Lastly, this project's engagement and outreach plans address both outreach to user- and developer-communities and create a pipeline of future contributors. The proposed enhancements to the R infrastructure address four key needs and are enablers to research. In terms of scalability, the R environment needs support for both scaling up computation to support out-of-core data and scaling down computations with compact data formats. In terms of deployment, the R language needs to support multiple deployment formats. This project envisions addressing this need by supporting separate compilation of a subset of R to both native binaries and to WebAssembly where the former would have the benefit of performance and could be linked as a library, while the latter would benefit from the ubiquity of JavaScript, running in any browser. In terms of robustness, the R language needs tools to verify and validate native code. This project will extend existing checkers with a detailed understanding of R semantics and to automatically check for a variety of errors statically, and optionally insert run-time checks for cases which cannot be proved statically. In terms of community outreach, there is a need for a pipeline for training potential R developers. This project will create educational material and operate two yearly summer schools. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →