SI2-SSI: Integrating the NIMBLE Statistical Algorithm Platform with Advanced Computational Tools and Analysis Workflows
University Of California-Berkeley, Berkeley CA
Investigators
Abstract
The software developed in this project will enable scientists to learn more from complex data and to share new analysis methods more easily. Increasingly, scientists in many fields aim to draw sound conclusions from large and complex data sets. Such fields include environmental biology, political science, education research, atmospheric and oceanic science, climate science, and many others. Data may be complex because many related variables are measured and/or because some measurements are not independent from each other. Non-independence can arise when some variables are measured repeatedly through time; or when measurements are made at nearby locations; or when measurements are made on groups of related individuals; or for a combination of those and other similar reasons. For such cases, general statistical methods have been developed to allow researchers to tailor their analysis to each data set in order to account for the relationships among the data. Such methods rely on computer algorithms to explore the range of possible conclusions given the uncertainties inherent in limited data. Within those general methods there are many varieties of specific approaches that have been and continue to be developed. Thus, a major software gap has emerged: Many new and evolving methods are not easily available for application by a wide range of scientists because there has not been a software framework that makes them easy to program and disseminate. This project will support continued development of the NIMBLE software to help fill that gap. As a result, scientists will be able to use computational analysis methods more flexibly, to combine and compare different algorithms more easily, to integrate such algorithms into other software workflows, and to gain better computational performance. This will enable more advanced and more routine use of some modern computational methods for analyzing complex data. The existing NIMBLE framework for hierarchical statistical models and algorithms comprises a model specification language, a language for programming model-generic algorithms within the R statistical environment, and a compiler that generates, compiles and interfaces to model- and algorithm-specific C++ for efficient execution. These enable general implementation and dissemination of methods such as Markov chain Monte Carlo, sequential Monte Carlo, and many related methods. In this project NIMBLE will be extended and generalized to be more powerful and flexible, enabling use in a variety of software workflows. Extensions to NIMBLE's core capabilities will include harnessing automatic differentiation and parallelization in generated C++, enhancements to its existing linear algebra capabilities, more efficient implementation of large statistical models including those with structural uncertainty such as latent group membership, and extensions to the statistical modeling language. Enhancements to facilitate integration of NIMBLE-generated models and algorithms with other software will include generation of stand-alone executables, generation of clearly defined application-programmer interfaces such as for use by Python, features to call user-provided libraries from algorithm code, features to load and save data via standard formats such as JSON and NetCDF, and separation of NIMBLE components into distinct packages. The project will include substantial outreach, training, and user community development. These activities will include development of uses cases in fields such as population and ecosystem ecology, oceanography, climate science, political science, and education. They will also include workshops, user meetings, key-user visits, and training material. This award by the Advanced Cyberinfrastructure Division is jointly supported by the NSF Directorate for Mathematical and Physical Sciences (Division of Mathematical Sciences).
View original record on NSF Award Search →