CICI: IPAAI: A Data Provenance Framework for Medical Machine Learning Research

$900,000FY2025CSENSF

University Of California-Los Angeles, Los Angeles CA

Investigators

Yuan Tiancontact Aichi Chien Yanyan Zhuang

Abstract

Artificial intelligence (AI) systems that read clinical notes and medical images promise earlier diagnoses, personalized treatments, and lower costs. However, these systems face critical challenges that threaten their reliability and ethical use. Data integrity problems, such as mistakes or tampering, can distort models and endanger patient care. In addition, patient data may be withdrawn due to revoked consent or legal obligations. There is no reliable way to see where a medical model's training data came from, whether that data was tampered with, or how to delete patient records effectively and efficiently from a model. This project will create the first end-to-end provenance framework for medical AI that enables tracing, auditing, and, when necessary, removing data efficiently and efficiently. The results will improve patient privacy, reliability of medical AI, and provide open-source tools for trustworthy AI. Building this framework is challenging, as medical datasets are unstructured, multimodal, dynamic, and come from many providers. Further, removing tainted data can force a full model retrain and thus harm model performance. To address these challenges, this research has three thrusts: (1) Automated inferences of public machine learning (ML) models that were trained on corrupt datasets, (2) Efficient logging of datasets and ML models usage in medical research workflows, and (3) Efficient machine unlearning to remove compromised or sensitive data points without retraining. This project advances the foundations of secure medical data provenance, machine unlearning, and provides open-source tools and coursework that prepare the next generation of medical and computer scientists to build trustworthy AI. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →