GGrantIndex
← Search

Population Structure Admixture and Selection across the 1000 Genomes Data Set

$436,083U01FY2011HGNIH

Stanford University, Stanford CA

Investigators

Linked publications & trials

Abstract

DESCRIPTION (provided by applicant): The 1000 Genomes Project (TGP) has tremendous potential to answer fundamental questions in human population genetics and shape the future design of medical genomic studies. Key to realizing this potential is the development of efficient, robust, and powerful computational methods for analysis of the copious amounts of data generated by the project. Here, we propose novel approaches for characterizing population structure, analyzing patterns of admixture, and localizing signatures of selection across the 2,000 samples of the TGP. Our project has three primary aims. First, we will construct detailed models of human demographic history based on the TGP. To accomplish this, we develop approaches for analyzing the joint allele frequency spectrum of rare and common SNPs, copy number variants (CNVs), and haplotypes across all the populations being surveyed. Having full sequence data will render these approaches dramatically better at making inferences about the recent past, where distortions in frequency spectra are particularly important for testing associations with rare variants. Second, we will characterize patterns of population structure and admixture in the four Hispanic/Latino and three African-American TGP samples. The TGP presents a tremendous opportunity for catalyzing population and medical genomics research for these important and understudied ethnic minority groups. We will develop novel statistical genomic approaches for reconstructing the genetic history of admixed populations and apply these methods to the TGP samples. Our methods will be tailored for short-read sequence data and will leverage the trio design of the sampling. Third, we will detect signatures of balancing, purifying, and positive selection in the full TGP data set. We will develop software tools to integrate signatures of natural selection based on a new approach that uses numerical methods to fit a diffusion approximation to the multi-dimensional site frequency spectrum. This approach allows identification of distortions caused by positive, balancing, or negative selection. The method is especially well suited to low coverage short-read sequence data. These inferences will be integrated with the maps of GWAS hits to accelerate discovery of disease-associated variants. RELEVANCE: Medical genetics research provides a vehicle for uncovering the heritable basis of complex disease. The 1000 Genomes project is an international effort to sequence the genomes of approximately 2,000 diverse human subjects. We propose to analyze these data in order to characterize differences among genomes and catalyze medical and population genomic research throughout the world.

View original record on NIH RePORTER →