GGrantIndex
← Search

SCience-INtegrated Predictive modeLing (SCINPL): a novel framework for scalable and interpretable predictive scientific modeling

$200,000FY2022MPSNSF

Duke University, Durham NC

Investigators

Abstract

Scientific modeling is at a critical and defining crossroads. With breakthroughs in experimental technology, high-quality data can now be obtained for complex scientific and engineering problems. However, the generation of such high-quality data entails large experimental and computational costs, resulting in limited data for scientific investigation. While predictive modeling provides some relief, recent work has revealed two key shortcomings with existing models: they often yield poor predictive performance when trained with limited data, and can violate established scientific principles, which may lead to erroneous and spurious scientific conclusions. This project will develop a novel SCience-INtegrated Predictive modeLing (SCINPL) framework which addresses these limitations. SCINPL paves the road for transformative scientific research, equipping practitioners with accurate, cost-efficient and interpretable predictive models for guiding scientific progress. This framework can catalyze closer collaborations between the scientific and data science communities, by demonstrating the practical advantages of science-driven statistical learning and data-driven scientific discovery. SCINPL provides a radical paradigm shift for scientific discovery in a broad range of fields, enabling scientists to push forward the frontiers of scientific knowledge and engineering via improved science-based data science tools. SCINPL features a suite of new probabilistic Bayesian models, which are capable of integrating a wide range of prior scientific domain knowledge as prior beliefs for predictive modeling. This integration of scientific knowledge with data-driven models not only provides improved predictive performance with reduced uncertainty, but also enables better interpretability and thus scientific discovery given limited training data. The first model, called the Boundary-constrained GP model, integrates known boundary information for the response surface within a Gaussian process (GP) framework. The second model, the Graphical Multi-fidelity GP model, embeds dependency information between scientific models for predictive modeling. The third model, the Gaussian Process Subspace regression model, integrates subspace information representing dominant physics for GP modeling. For each model, the investigators will (i) establish a solid theoretical foundation for predictive modeling, which demonstrates the improved predictive performance via the integration of scientific information, (ii) present a comprehensive methodological framework and efficient suite of algorithms for performing this desired integration of scientific principles within probabilistic modeling, and (iii) demonstrate the usefulness of such models for cost-efficient, interpretable and principled scientific discovery. Major emphasis is placed on demonstrating the effectiveness of SCINPL in tackling a broad range of complex and expensive scientific problems, including the design of 3D-printed aortic valves, the study of heavy-ion collisions, and the optimization of rocket engines for spaceflight. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →