eMB: Explainable and Physics-Informed Machine Learning for Cell Typing via a Modern Optimization Lens
Purdue University, West Lafayette IN
Investigators
Abstract
Human-induced pluripotent stem cells (hiPSCs) represent a groundbreaking advancement in stem cell research. Derived from skin or blood cells, hiPSCs are reprogrammed to an embryonic-like state, enabling them to differentiate into any cell type, such as blood, immune, heart, and neuron cells. This Nobel Prize-winning technology circumvents the ethical issues associated with human embryonic stem cells and provides valuable models for studying human development, disease, drug testing, and potential cell-based therapies. However, to leverage hiPSCs in clinical settings and large-scale manufacturing, there are significant challenges to overcome. One major challenge is accurately identifying cell types at different stages of differentiation, which is crucial for ensuring the cells perform their intended functions. Traditional experimental methods for cell identification can be costly, time-consuming, and limited in robustness. This research aims to address these challenges by developing explainable and physics-informed machine learning models. These models will enhance the accuracy and reliability of cell type identification, ensuring that hiPSC technology can be widely adopted in clinical and industrial applications, ultimately benefiting society through improved healthcare solutions and advancing our understanding of human biology. The project will involve both graduate and undergraduate students, with graduate students focusing on core theory and method development while undergraduates investigate applications. The PIs will work with Purdue’s Research Experience for Undergraduates (REU), and Summer Vertically Integrated Projects (VIP) program to mentor additional underrepresented minority students each summer to work on interdisciplinary research in stem cell engineering and machine learning. Outreach activities will include developing hands-on K-12 activities, partnering with local organizations, organizing lab tours, and presenting research at the "Mending Broken Hearts" gallery exhibit, aiming to increase STEM participation among underrepresented groups. This research project addresses critical challenges in the adoption and scalability of human-induced pluripotent stem cells (hiPSCs) by developing novel machine learning methodologies. The specific problems targeted include the need for high-accuracy, cost-effective cell type identification during differentiation and the incorporation of prior biological knowledge into explaining machine learning models. The PIs intend to create explainable machine learning algorithms that leverage single-cell RNA sequencing (scRNA-seq) and imaging data to provide counterfactual explanations, highlighting key genes or image features critical for cell typing. These models will utilize mixed-integer programming to solve counterfactual explanations to generate interpretable predictions, addressing the limitations of current black-box approaches. Additionally, the aim is to overcome data scarcity by integrating biological knowledge into the machine learning frameworks, employing novel physics-informed machine learning algorithms. This research will develop and benchmark these innovative methods, applying them to the study of Tumor Associated Neutrophils (TANs) for cancer therapy. By enhancing explainability in cell typing predictions, this work will significantly advance the field of stem cell research and its applications in regenerative medicine and oncology. This project is jointly funded by the Mathematical Biology Program in the Division of Mathematical Sciences, the Infrastructure Innovation for Biological Research in the Division of Biological Infrastructure (BIO/DBI), and Office of Strategic Initiatives. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →