Privacy-Aware Federated Learning for Breast Cancer Risk Assessment

$732,399U24FY2025CANIH

Indiana University Indianapolis, Indianapolis IN

Investigators

Linked publications & trials

Paper 38750041 Paper 38347141 Paper 38347140 Paper 38228979

Abstract

Federated learning (FL) has gained a lot of attention recently, as it enables analyses of data from numerous sites without the need to share data, i.e., each collaboratorâs data are always retained within their site. FL is advantageous as it can: 1) overcome cultural/ownership, privacy, & regulatory concerns (since data never leave the local site), 2) provide access to restricted data, 3) allow the collection of meaningful amounts of data for analyses of rare diseases, & 4) address all patient populations. Thus, FL can be noted as a novel paradigm for multi-site collaborations, enabling access to ample data acquired under varying equipment and protocols, essential to developing robust generalizable models. To this end, we have developed the Federated Tumor Segmentation (FeTS) platform and the Open Federated Learning (OpenFL) library, as open-source tools with a commercially friendly license that have facilitated a) the largest to-date real-world federation, involving 3D brain tumor MRI data from 71 sites across 6 continents, and b) the very first computational challenge in FL, forming the first benchmarking environment and dataset in the field. This FeTS-OpenFL infrastructure has further been used to c) identify tumor-infiltrating lymphocytes in histopathology images and d) segment dense tissue in 2D digital mammography (DM), highlighting its generalizability in different imaging and disease types. Building upon our successful FeTS-OpenFL infrastructure, we propose to enhance its functionality with new developments on privacy-aware FL towards classification workloads and evaluate it on a first-of-its-kind use case on breast cancer (BC) risk assessment. BC is the most diagnosed cancer in the US, the 2nd leading cause of death from cancer in women, and screening is performed routinely with 2D digital mammography (DM) for women in their 40s-50s. However, DM yields a lot of false positives and unnecessary subsequent procedures. To alleviate these issues, 3D Digital Breast Tomosynthesis (DBT) has been developed and increasingly replacing DM. Our group has developed novel volumetric breast density (VBD) measures from DBT scans. Building upon our teamâs collective pioneering work in FL and BC risk assessment, in this proposal we focus on developing a trustworthy, zero-code principle FL framework for training AI-based classification models and built-in functionality to i) generate realistic synthetic data, matching local population characteristics, for data augmentation & privacy preservation, and ii) automatically determine quantitative & interpretable settings of optimal privacy preservation. We will use this framework to perform the largest to-date evaluation of training deep-learning models for BC risk assessment using DBT VBD measures and other established risk factors while leveraging multi-site data of women undergoing BC screening across 5 U.S. states. We will also disseminate resources via distribution of source code, deployment to collaborating sites, and organization of training activities. Our overarching goal is an easy-to-use translatable trustworthy FL framework, lowering the barrier for participation in large- scale FL studies, and paving the way to accelerated discovery in healthcare.

View original record on NIH RePORTER →