GGrantIndex
← Search

Disciplinary Improvements: AI Readiness, Reproducibility, and FAIR: Connecting Computing and Domain Communities Across the ML Lifecycle

$1,260,000FY2022CSENSF

University Of California-San Diego, La Jolla CA

Investigators

Abstract

This research coordination network will build better practices based on the FAIR data principles in multiple disciplinary communities, focusing on three themes: FAIR in machine learning, AI readiness, and reproducibility. These themes were chosen to address the urgent needs of researchers in the geophysical and computer sciences. A key problem addressed is that machine learning models are often disseminated with default “best” parameters (e.g. pre-trained models), but lack documentation on model training, data preparation for training, and lacking permanent identifiers. This hinders scientific reproducibility of machine learning in the geosciences and often underestimates the variance in model outputs. The project will build on prior successes in the geophysical community and utilize existing networks to build relationships in this new research coordination network, thereby creating a network of networks. Experts and affinity groups related to machine learning will be convened to understand emerging best practices, which tools and resources to leverage, and how to stimulate experimentation that quantifies the relationship between the FAIRness of data and how easily and efficiently machine learning algorithms can be applied, as well as advancing reproducibility. Geosciences data repositories will be better equipped to support their users in preparation, deposit, access, and reuse of data using machine learning methods. The RCN will also develop a roadmap that will serve as a guide for community-led efforts to spotlight attention and funding in areas where an application of FAIR data principles and open science in AI research is needed. The project will host community events and working groups to gather issues and practices for all three themes. The target community is geophysical data archive providers, data archives created by researchers themselves, machine learning scientists and practitioners, high performance computing centers, existing organizations revolving around FAIR data, big data, and AI. The team will coordinate community activities and create a series of reports including both retrospective and forward-looking guidelines. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →