Collaborative Research: SCH: Fair Federated Representation Learning for Breast Cancer Risk Scoring

$350,000FY2022CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Abstract

With the availability of electronic health records (EHRs) in hospitals and clinics, powerful machine learning models can be developed to support precision population health and clinical decision-making tasks such as disease detection, outcome prediction, and treatment recommendation. This project creates a machine learning framework for training models across hospitals and new tools for incorporating fairness into distributed machine learning. The project will embed these algorithmic innovations to evaluate their applicability to real-world precision population health with a primary focus on addressing screening and treatment disparities in breast cancer, along with additional evaluation for various healthcare applications. This project will conclude with collaborative development and deployment across multiple academic and medical institutions and will include curriculum development on fairness in machine learning and federated machine learning. This project also plans to involve participation by graduate students from underrepresented groups. This project will focus on representation learning approaches for training EHR models, where embedding vectors can be trained with deep learning models to represent clinical concepts (e.g., diagnoses and medications) and patient data. The resulting embedding vectors can be input to the downstream applications, such as breast cancer risk scoring. This project creates a transformative new direction for addressing fairness in machine learning for healthcare by addressing the challenges of mitigating model and data biases. The first challenge is modeling bias, as most representation learning algorithms in healthcare do not consider any fairness measures, which can lead to biased embeddings. To this end, this project develops a fair representation learning algorithm that can be adapted to various fairness metrics. The second challenge is data bias, as the distributed nature of the data limits both the downstream equity and generalization performance of the resulting embedding vectors. This project addresses data bias using a new fair federated representation learning framework to learn representations that satisfy fairness criteria by training jointly across multiple sites without sharing patient data. In addition to developing the algorithmic and theoretical frameworks for these directions, this project will also build and release open software. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →