Evidence-based guidance to support internal and external validity of AI healthcare datasets

$490,452R01FY2025HGNIH

Stanford University, Stanford CA

Investigators

Nicole A Martinez-Martincontact Mildred K. Cho

Abstract

There is widespread agreement that fundamental issues for ethical development of artificial intelligence for health care (AI-HC) are generalizability and mitigating systematic error. In the U.S., the National Institutes of Health had launched initiatives to address these issues. However, it is not yet clear whether these initiatives are sufficient to achieve goals of generalizability, mitigating systematic error, and external and internal validity in AI-HC. Concepts and practices associated with goals of generalizability shape the extent to which AI-HC researchers and developers achieve their goals for generalizability and mitigating systematic error in their models. Yet insufficient understanding of how concepts such as systematic error and generalizability are defined and put into practice within AI-HC projects is an impediment to achieving these goals for AI-HC datasets. Prior studies of biomedical research indicate that scientists often hold differing concepts related to generalizability, systematic error, and internal and external validity. In turn, these concepts shape implementation of the practices used to achieve their related goals, such as technical approaches to reduce systematic error or efforts to address the composition of datasets. We propose to assess how issues relevant to generalizability and systematic error are conceptualized and put into practice in 50 NIH-funded AI-HC research projects. We will employ a âmicroethicsâ perspective, which focuses analysis on how high-level ethical goals are understood and put into practice in technical fields. This perspective allows examination of how different actors (e.g. data scientists, clinicians, annotators) involved in AI dataset development perceive relevant issues, such as: how internal and external validity are assessed; trade-offs in scientific and dataset composition goals; and the downstream impacts on the validity of datasets. Informed by these findings, we will develop evidence-informed practical guidance for the future creation of datasets in AI-HC that support generalizability, mitigation of systematic error, and internal and external validity. We will use a multi-pronged strategy to encourage uptake and implementation of guidance through the Bridge2AI program, the Center for ELSI Resources and Analysis, and interactive, case-based design exercises and workshops. [Modified

View original record on NIH RePORTER →