CRII: III: Metadata-guided Imbalance-Modeling for Robust Computational Healthcare

$190,960FY2023CSENSF

University Of Memphis, Memphis TN

Investigators

Abstract

Imbalance naturally exists in health data from text messages to electronic health records, which dampens the reliability, robustness, and trustworthiness of building computational healthcare models. However, existing methods ignore the imbalance's fundamental causes, metadata, such as demographics (e.g., gender and age), geolocation, and data sources. For example, given two cancers in a dataset, lung and breast cancers, while lung cancer is more frequent overall and in males, breast cancer occurs less frequently than lung cancer and more frequently in females, demonstrating imbalance patterns vary across metadata (gender in this case). Metadata includes essential information to describe the diversity and imbalance natures of health data. However, few studies have considered the diverse imbalance patterns across metadata factors, which has posed urgent needs and unique challenges in promoting robust and reliable imbalance modeling. This project proposes novel learning strategies that guide imbalance modeling by metadata and incorporate the varied imbalance patterns (e.g., breast cancer frequency for males and females) into training machine learning models. The general goal is to create reliable, open-source tools that other health researchers and practitioners can easily adopt. For example, one particular project outcome will be improving the machine learning classifiers for late effect assessments of pediatric cancer treatment at the St. Jude Children's Research Hospital. Materials (e.g., publications) and education activities will raise awareness and empower decision-making for health stakeholders with actional methods of developing and deploying machine learning on imbalanced healthcare data with rich and diverse metadata, such as demographics. This project will create a novel metadata-guided imbalance-learning framework by meta-learning that can achieve reliable and robust machine learning across different metadata factors. The investigator will start with individual metadata at a time, develop novel extensions to joint imbalance learning across multiple metadata factors (e.g., gender and disease category), and propose a self-adapting weighting mechanism to balance different metadata and prevent meta-learning overfitting. Finally, the investigator will propose an unsupervised generative model to infer missing metadata attributes, which jointly works with the imbalance-learning framework. While the framework generally aims to promote model robustness, the method can also apply to demographic fairness due to its goals to achieve balance performance across demographic groups. This project will examine and evaluate the proposed framework on a variety of health data by 1) new settings on different metadata factors and 2) effects and sensitivities of metadata factors for imbalance learning. Specific deliverables include developing a novel meta-learning toolkit with broad utility and educational activities to train the next-generation computational healthcare workforce. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →