GGrantIndex
← Search

Collaborative Research: SaTC: CORE: Medium: PREMED: Privacy-Preserving and Robust Computational Phenotyping using Multisite EHR Data

$900,000FY2021CSENSF

Emory University, Atlanta GA

Investigators

Abstract

Tensor analysis offers an effective approach to convert massive Electronic Health Records (EHRs) into meaningful and interpretable clinical concepts, or phenotypes, such as diseases and disease subtypes. It can cluster patients into subgroups and capture the interactions between multiple attributes (e.g., specific procedures used to treat a disease), enabling precision medicine. Effective phenotyping needs to be supported by a large number of diverse samples to avoid potential population bias. A major challenge is how to derive phenotypes jointly across multiple institutions, while preserving individual patients' privacy at each site. The goal of this project is to develop a federated tensor factorization framework for Privacy-preserving, Robust, and Efficient computational phenotyping using Multisite EHR Data (PREMED). While many techniques have been developed for federated learning for each of these goals, their synergy has not been well studied. Communication-efficient techniques such as compression have an intrinsic benefit to privacy (smaller disclosure risks) and robustness (smaller adversarial impact) due to the compressed and obfuscated communication. Further, federated tensor factorization presents unique challenges due to its multi-factor structure and unsupervised nature. The project aims to exploit the synergy between efficiency, privacy, and robustness and address the three interrelated challenges with a holistic approach, while utilizing the multi-factor structure of tensor factorization. The research outcome will allow institutions to jointly perform computational phenotyping using their privacy-protected data effectively and efficiently. This project includes a set of interrelated objectives including: (1) developing communication-efficient techniques for federated tensor factorization such as local Stochastic Gradient Descent (SGD) to reduce communication frequency; and multi-level compression methods to reduce per-round communication leveraging the multi-factor structure of tensor factorization; (2) developing privacy-preserving federated tensor factorization methods by exploiting the intrinsic privacy benefit of the communication-efficient techniques; and privacy-preserving input synthesization methods that offer more versatility; and (3) developing robust statistical aggregation methods for handling potential Byzantine failures and malicious sites by utilizing the intrinsic robustness benefit of the communication-efficient techniques; and robust learning-based aggregation methods for sparse settings based on truth inference and adaptive site valuation approaches. The project includes case studies using real EHR data from Emory and UTHealth for phenotype discovery and phenotype-based predictive studies in the context of Alzheimer's Disease and Sepsis. The project also includes a set of synergistic activities including organization of multi-site computational phenotyping challenges; development of collaborative sidecar courses; and active involvement of students. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →