Preparing Association Analysis Software Tools for Next Generation Sequencing Data

$363,970R01FY2016HGNIH

Harvard School Of Public Health, Boston MA

Investigators

Linked publications & trials

Abstract

? DESCRIPTION (provided by applicant): The availability of next-generation sequencing data in large-scale association studies provides a unique research opportunity. The data contains the information that is required to identify causal disease susceptibility loci (DSL) for many mental health phenotypes and psychiatric diseases. In order to translate the wealth of information into DSL discovery, powerful statistical methodology is required. So far, a large number of rare variant association tests have been proposed. However, they do not incorporate all the important information about the variants. So far, none of the existing approaches takes the physical location of the variant into account. Under the assumption that deleterious DSLs and protective DSLs cluster in different genomic regions, we will develop a general association analysis framework that is built on spatial clustering approaches. The framework will be able to handle complex phenotypes, e.g. binary, quantitative, etc., and be applicable to different study designs, i.e. family-based studies and designs of unrelated subjects. If the DSLs cluster indeed, the increase of statistical power of the approach will be of practical relevance, enabling the discovery of DSLs. In the absence of DSL clustering, our approach will achieve similar power levels as existing methodology. Furthermore, in order to test larger genomic regions for association, we will develop network-based association methodology. The network-based approach will have sufficient power for larger genomic region than existing approaches, and, at the same time, provide an intuitive understanding of the complex relationships between the variants that drive the association, fostering new biological insights. The approach can incorporate complex phenotypes and different design types. We will also use the information about the physical locations of the rare variants to detect population substructure/admixture. Since rare variants are genetically much younger than common variants, approaches that take the physical locations of the variants and their clustering into account will provide a much finer resolution picture of population substructure in sequence data than existing approaches, e.g., EIGENSTRAT. We will use community-detection algorithm for the classification of study subjects in genetic homogenously subgroups. All the proposed methodology will be implemented in user- friendly software packages with existing user-communities, i.e. PBAT, NPBAT and R.

View original record on NIH RePORTER →