GGrantIndex
← Search

Single-cell Analysis of Normal and Perturbed Dynamic Biological Systems

$677,833ZIAFY2022ESNIH

National Institute Of Environmental Health Sciences

Investigators

Linked publications & trials

Abstract

The major goal of this project is to develop and apply innovative, fast and scalable computational frameworks to improve our understanding of spatiotemporal biological processes during normal or disease progression using single-cell analysis. This is currently a major challenge due to both technical and computational issues as well as lack of appropriate benchmark data sets. I will summarize two published and two ongoing research accomplishments of this project which not only provide innovative tools for modeling spatiotemporal biological processes but also advances our knowledge of disease progression in blood, respiratory, endocrine and reproductive systems. Our recently published data-driven model called Dynamic Spanning Forest Mixtures (DSFMix) in Briefings in Bioinformatics uses decision trees to build a forest for visualizing and characterizing complex developmental processes that unfold over time. DSFMix input consists of single-cell data collected at different time points, representing distinct stages of development, and its output is a mixture of nested, discrete and/or continuous and directed or undirected cell lineages. We first demonstrate the importance of forest-based algorithms compared to single-tree approaches for visualizing and characterizing stem-cell developmental in the intestine and blood. We also benchmark DSFMix to several traditional pseudotime as well as recently published time dependent trajectory models using observed time and network similarity analysis. We further demonstrate how DSFMix can be used to visualize, test, and characterize complex relationships during dynamic biological processes such as spermatogenesis, epithelial-mesenchymal transition (EMT), stem cell pluripotency, early transcriptional response from hormones and immune response to coronavirus disease (COVID-19). In the study, we also demonstrate how DSFMix can be combined with genomic correlation search engines for validation and new scientific discoveries. For example, mutation of the RNF17 gene, which encodes for proteins in the testes (Pan et al. 2015), is negatively correlated with the DSFMix gene signature, including the knockdown signature of ubiquitin, an essential protein found in all eukaryotic cells (Bose et al. 2014). Further, an additional gene, MAMLD1, a causative gene for 46 XY disorders of sex development associated with abnormal development of the testes (Miyado et al. 2017), is negatively correlated with the expression of our gene signature. DSFMix can also be applied in a supervised setting to investigate relationships between the components of larger biological systems such as the human body. For example, we use DSFMix to visualize immune response due to coronavirus disease (COVID-19) as patient-specific healthy to disease progression as individual trees in a forest. We are currently applying DSFMix to immune response single-cell time course COVID-19 data from 30 patients with and without COVID to better understand the dynamic changes that leads to a patient either surviving or dying. Preliminary results suggest interactions of CD4+ T cells and B cells are strongly associated with patient survival motivating the need to develop dynamic models for multicellular systems such as our recent published model in Frontiers in molecular bioscience denoted as Multiscale Multicellular Quantitative Evaluator (MMQE) which adopts a hybrid computational approach comprising of continuous, discrete and stochastic non-linear model formulations to predict a system-level immune response as a function of multiple dependent signals and interacting agents including cytokines and targeted immune cells. MMQE quantifies the dynamics of lymphocytes proliferation mediated by a joint downregulation of IL-4 and upregulation of IL-2 during pathogen invasion. Specifically, it explains the biological phenomenon of T cells activation followed by the proliferation of T, B and plasma B cells, as well as the antagonistic effects of IL-2 and IL-4 on lymphocyte proliferation in the presence of antigens. Using simulation studies, we first evaluate the immune response resulting from the interactions between IL-2 and IL-4 ligand binding and their receptors. We next assess the dependence of the immune response on the activation agents and the interplay of IL-2 and IL-4 signaling. This results in a proliferation increase of T and B cells to a maximal level followed by degradation regardless of the immune system activation agent. However, the maximum number of T and B cells produced during their proliferation lifecycle depends on the activation agent. For example, B cell activating agents lead to a higher maximal level of T and B cells proliferation compared to DC agents. Additionally, the proliferation levels of T, B and Plasma B cells tightly depend on the interaction between cytokines IL-2 and IL-4. Thus, T cells, B cells, and Plasma B cells together display a high level of their immune response which is associated with a low concentration of IL-4 versus high concentration of IL-2. We validate the MMQE model using in vivo mouse models by targeting T cells as a variable of interest based on their relative importance in immune response. We are currently working on an extension of the model to account for early innate immune response from COVID-19 infection involving cells from myeloid lineages such as monocytes by integrating data from single-cell analysis using machine learning, stochastic and mechanistic modeling. I will next highlight two ongoing applications of DSFMix with the potential to improve treatment for diabetes and resistant AML. Glis3 regulation of endocrine differentiation: Diabetes is mainly caused by insufficient insulin secretion due to -cell loss, or a pancreatic dysfunction. Understanding mechanisms during -cell development is key to providing a cure. A key transcription factor Glis3 has been shown as an endocrine progenitor (EP) regulator during -cell development. We applied DSFMix to study the molecular mechanism of progenitor cells during pancreatic morphogenesis conditional on the Glis3 knockout at two developmental stages E.15.5 and E.18.5. By controlling for the observed variations in the lineages in the wild type and knockout conditions, DSFMix identified key transcription factors among many others such as Btbd17a, Fev, Gng12, Mafb, Ghrelin Appetite-regulating hormone (Grhl) or epigenetic markers like H2afz, all associated with progenitor, intermediate and beta cell differentiation processes. We plan to better understand the dynamic role some of these key genes play in the wild type compared to the knock-out model by optimizing the output of DSFMix with a new a constraint stochastic ordering algorithm. Integrating single-cell data from multiple patients at an early stage of AML treatment: The prognosis of most patients with acute myeloid leukemia (AML) is generally poor due to frequent relapse, which is thought to be from the persistence of leukemia-initiating stem cells following treatment. Early changes at the molecular level suggest a potential establishment of chemotherapy resistance that can mediate long-term survival. However, intra-tumor and inter-tumor heterogeneity confounds our ability to map lineages and state transitions effectively during tumor progression and treatment response. The major goal of this project is to extend DSFMix to study in detail the complex dynamical and heterogeneous changes in the lymphoid and myeloid lineages after first line of treatment using single-cell analysis and to identify resistant cell types or markers that are predictive of long-term survival. To apply DSFMix effectively, we implemented a new model framework that is ideal for clinical applications, where data for individual cells is collected across multiple patients. This work is currently in progress.

View original record on NIH RePORTER →