GGrantIndex
← Search

Collaborative Research: GEO OSE Track 2: Project Pythia and Pangeo: Building an inclusive geoscience community through accessible, reusable, and reproducible workflows

$710,411FY2024GEONSF

University Corporation For Atmospheric Res, Boulder CO

Investigators

Abstract

This project builds upon the demonstrable success of two community-driven open geoscience efforts: Pangeo and Project Pythia. Pangeo has advanced transformative platforms and paradigms for “Big Data” geoscience in the cloud; Pythia has built open, interactive learning resources for Python-based geoscience workflows built on public cloud datasets that “just work” for users. The central goal of this project is to reduce barriers to scientific progress by building community around shared scientific workflow knowledge, using the Pythia Cookbook format. “Cookbooks” imply collections of recipes for transforming raw ingredients (publicly available data) into scientifically useful results. Cookbooks are based on Jupyter notebooks but explicitly tied to reproducible computational environments and supported by a rich infrastructure enabling collaborative authoring and automated health-checking – essential tools in the struggle against the widespread problem of notebook obsolescence. Open-access, cloud-based Cookbooks are a democratizing force for science. By growing the collection of Pythia Cookbooks while growing the community of users, contributors, and maintainers, this project will grow the capacity of current and future geoscientists to practice open science within the rapidly evolving open science ecosystem. Project Pythia exists to house, share, and accelerate the development of high-quality learning resources for Python-based computing in the geosciences. This project will advance geoscience research and education by developing, documenting, and propagating best practices for highly scalable reproducible data analysis in the cloud using the open-source ecosystem. The Pythia team will develop science-driven exemplar Cookbooks demonstrating highly scalable versions of common analysis workflows on high-value datasets across numerous geoscience domains, with content chosen to accelerate community awareness and participation. Content will span disciplines including Atmospheric Science, Physical Oceanography, Hydrology, Glaciology, Climate Science, and applications of Machine Learning. Infrastructure will be deployed for performant data-proximate Cookbook authoring, testing, and use, on both commercial and NSF-funded cloud platforms. Both the Cookbook collection and the community of user-contributors will broaden and grow through annual workshops, outreach, and classroom use, with recruitment specifically targeting under-served communities. Priorities will be guided by an independent steering board; sustainability will be achieved by nurturing a vibrant, inclusive community backed by automation that lowers barriers to participation. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →