CRII: SCH: Towards robustness to data disparities: a framework for efficient and reliable data-driven decision-making tools for all

$175,000FY2022CSENSF

Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI

Investigators

Abstract

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2). One of the most promising applications of machine learning (ML), is its ability to guide personalized decision making, especially in the context of healthcare. Predictive ML models can help clinicians identify patients at high risk of adverse outcomes, enabling them to make informed decisions about preventative measures. Causal ML models can help clinicians and patients understand the effects of interventions enabling them to make more informed decisions about the treatment options. Importantly, the reliability of predictive and causal ML methods depends on the quality of data used to develop them. Unfortunately, data quality often reflects systemic inequalities in both access to and quality of healthcare leading to data disparities. Examples of unequal quality of care include settings in which Black patients are less likely to receive referrals to specialists or in which women’s pain is less likely to be taken seriously, both leading to potential delays in diagnosis and treatment. This means that data collected from specific subgroups of the population are more prone to missingness. In terms of access, data reveal that Black and Hispanic groups are more likely to be uninsured and less likely to have a usual place to go to for medical care. This results in the underrepresentation of subgroups of the population in observational data such as electronic health records typically used to develop ML models. In this proposal, we will develop and theoretically analyze robust ML methods (both predictive and causal) that ameliorate the effects of data disparities. The proposed research has two main prongs. The first prong focuses on developing prediction tools for diagnosis that are robust to inaccuracies due to underrepresentation of minorities. We will develop model training methods that discourage the models from learning patterns that are reflective of data biases rather than true causal mechanisms. We will theoretically analyze the robustness and efficiency of our models. The second prong focuses on developing methods for estimation of causal effects of interventions that are robust to data missingness and measurement error. While most existing work attempts to estimate the causal effect of an intervention, this project will study the estimation of intervals or bounds on the causal estimates which reflect the uncertainty in the quality of the collected data. We theoretically analyze the credibility and tightness of our bounds when trained using limited data. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →