GGrantIndex
← Search

Bayesian Differential Causal Network and Clustering Methods for Single-Cell Data

$304,449R01FY2022GMNIH

Texas A&M University, College Station TX

Investigators

Linked publications & trials

Abstract

Project Description DMS/NIGMS 2: Bayesian Differential Causal Network and Clustering Methods for Single-Cell Data A Significance A.1 Importance of the Problem to Be Addressed Single-cell RNA-sequencing (scRNA-seq) technologies have facilitated new biological discoveries that were impossible with bulk RNA-seq, such as discovering at the single-cell level new gene regulatory activities and cell types. However, in order to translate the fundamental biological knowledge advanced by the scRNA- seq to improved disease diagnosis, treatment, and prevention, new methods are required to comparatively study the molecular differences between normal and pathological cells/tissues, and between control and case/treatment groups. Although identification of differentially expressed genes across two sample groups has been extensively studied, to date, the vast majority of the existing methods for identifying gene regu- latory networks (GRNs) and cell types have, so far, focused on scRNA-seq data generated under a single experimental condition. In principle, these methods can be applied to one experimental condition at a time, based on which post hoc comparisons can be made in order to find the differences caused by experimental interventions. However, compared to joint modeling approaches, this two-step procedure is deemed less efficient and more susceptible to false discoveries due to lack of proper uncertainty propagation from the first step to the second. Moreover, most scRNA-seq network models are correlative in nature and do not infer causal gene regulatory relationships. There is, therefore, a critical need to develop new models for identifying the effects of experimental interventions on causal gene regulation and cell composition by jointly modeling scRNA-seq data across experimental groups. In the absence of such tools, mechanistically un- derstanding gene regulation and cell differentiation, and fully realizing the translational values of scRNA-seq studies will likely remain difficult. A.2 Rigor of Prior Research Aim 1. Many existing scRNA-seq network approaches adapt standard association measures to zero- inflated scRNA-seq data, e.g. Pearson correlation [1] and mutual information [2]. A common limitation of these methods is that they only quantify marginal dependencies, which is susceptible to spurious indirect associations [3]. Graphical models which deal with conditional associations are powerful alternatives to the marginal association measures. Numerous methods have been proposed for general purposes [4, 5] including the development on non-Gaussian data [6–9]. Specifically for scRNA-seq data, two undirected graphical models including Co-I Cai's work [10, 11] were recently proposed based on neighborhood selec- tion which, however, do not infer causal gene regulation. To identify causal relationships, several alternative methods [12, 13] were developed. However, these methods either ignore the count nature of scRNA-seq data, require a known pseudotime (which is rarely known in real scRNA-seq data), or do not theoretically in- vestigate causal identifiability for cross-sectional observations. For differential networks, many approaches [14–18] including the PI's prior work [19] have been developed for bulk RNA-seq data which showed great advantages of joint analyses over independent analyses. However, there exist much fewer differential net- work methods for scRNA-seq data, e.g., PT [20] and scdNet [21] . The common limitation of PT and scdNet is that they only consider marginal dependence (hence susceptible to false discoveries) and do not discover causality. Results from our preliminary results (§C.1) demonstrate that the proposed Bayesian network model is capable of identifying causal gene regulatory relationships in cross-sectional scRNA-seq data and often outperforms the state-of-the-art alternative methods. Aim 2. Very few methods are available to construct cell-specific networks because it is difficult to estimate networks with, in essence, sample size one. Recently, a hypothesis testing approach [22] was developed to estimate cell-specific networks. The method makes approximate network inference of each cell based on its neighbors. However, it only considers symmetric (undirected) marginal dependence, and therefore cannot infer causal regulatory relationships and is susceptible to spurious associations. The PI's prior work [23] addressed the "sample-size-one" problem in bulk RNA-seq data assuming the causal networks are smooth functions of additional covariates. However, the method is not applicable without covariates and does not allow feedback loops, a common motif in GRN. Existing work [24, 25] including the PI's [19] has 1

View original record on NIH RePORTER →