Elements: CausalBench: A Cyberinfrastructure for Causal-Learning Benchmarking for Efficacy, Reproducibility, and Scientific Collaboration

$599,905FY2023CSENSF

Arizona State University, Scottsdale AZ

Investigators

Abstract

While we are witnessing the exceptional success of artificial intelligence (AI) and machine learning (ML) technologies in many applications, users are starting to notice a critical shortcoming of the current approaches: they are not causally grounded. While being relatively recent, causal learning aims to go far beyond conventional machine learning and is emerging as a vibrant field with new opportunities and challenges. Yet, advances in this field are hampered due to the lack of cyber-infrastructure platforms, with unified benchmarks data sets, algorithms, metrics, and evaluation service interfaces for causal learning. Reproducible science is possible only when the outcomes can be quantified and compared to other approaches and lack of reproducibility results in serious concerns on validity of published research. This can only be achieved through open platforms for data, algorithm, and model exchange and evaluation. Therefore, CausalBench, a transparent, fair, and easy-to-use evaluation platform, provides the key functionalities necessary to establish trust in causal learning’s innovation, collaboration, and critical applications, including public health and sustainability. CausalBench is a novel cyberinfrastructure of benchmarking data, algorithms, models, and metrics for causal learning, impacting the needs of a broad of scientific and engineering disciplines and sustain discovery across all fields. The cyberinfrastructure enables the advancement of research in causal learning by facilitating scientific collaboration in novel algorithms, datasets, and metrics and promotes scientific objectivity, reproducibility, fairness, and awareness of bias in causal learning research. CausalBench includes (1) an “ontology” for benchmarking to standardize the evaluation methodology, improve transparency, and promote collaboration to efficiently advance causal learning, (2) standard and convenient mechanisms for the community to contribute data and models such that disparate datasets can be integrated in a standard way, and (3) integrated evaluation standards that can help assess of algorithms for novel problems in the emerging field of causal learning with observational data. The project trains PhD students in the area of causal discovery and causally aware data management challenges via integrative, cross-disciplinary approaches and prepares future researchers with skills in data intensive AI and machine learning systems. The project further provides an excellent context for master’s, undergraduate, and K-12 students to be aware of AI and machine learning and their potential impacts on urgent societal challenges including public health, and sustainability. This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Information and Intelligent Systems within the Computer and Information Science and Engineering Directorate. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →