RI: Small: Causal Structure Discovery from Diverse Data

$574,140FY2024CSENSF

Purdue University, West Lafayette IN

Investigators

Abstract

Causal reasoning from data is critical in many domains from medicine to computer software security. Recently, the role of causality in machine learning (ML) has been understood through ML solutions' over-reliance on correlations, resulting in lack of generalizability. To tackle this problem, researchers train models with data from multiple environments to extract features that are useful across domains. Other studies suggested that the causal relations between features can be leveraged to train robust models that generalize. However, unlike these ML methods that can use any collection of datasets, most of the existing causal discovery algorithms rely heavily on the assumption that we have access to interventional data, such as those from a randomized controlled trial. In practice, the datasets from different environments may carry common causal knowledge, but not necessarily arise due to well-defined interventions. Methods to systematically extract such common causal knowledge across domains from data are currently missing. This prevents ML solutions from leveraging causal structure explicitly. The goal of this project is to address this gap by developing novel algorithms that can extract cause-effect relations from unstructured, diverse datasets. The project outcomes are expected to unlock the potential of causal reasoning for data-rich domains with access to data from different environments and are expected to significantly widen the use of causal discovery among ML practitioners. Specifically, in the first thrust the investigator will characterize the fundamental limits of how much causal knowledge can be extracted from diverse datasets under minimal assumptions about the data generating process. In the second thrust, with his team, he will develop causal discovery algorithms to achieve these fundamental limits from such diverse datasets. In the final thrust, the proposed discovery algorithms will be evaluated across a wide range of datasets through the performance on downstream ML tasks that they enable through the learned causal structure. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →