Population Genetic Inferences from Dense Genotype Data

$324,451R01FY2009HGNIH

Cornell University, Ithaca NY

Investigators

Andrew G Clarkcontact Carlos D. Bustamante Rasmus Nielsen

Linked publications & trials

Paper 31539367 Paper 28007980 Paper 27756828 Paper 27111036 Paper 26712023 Paper 26494842 Paper 26383953 Paper 26198033 Paper 26186694 Paper 26093129 Paper 25963373 Paper 25963372 Paper 25557782 Paper 25403526 Paper 25329461 Paper 25233113 Paper 25144706 Paper 25043035 Paper 24926019 Paper 24813606 Paper 24809476 Paper 24770332 Paper 24708091 Paper 24469801 Paper 24458950 Paper 24379384 Paper 24256729 Paper 24128338 Paper 23979584 Paper 23910464 Paper 23908239 Paper 23791107 Paper 23733930 Paper 23699470 Paper 23666210 Paper 23077256 Paper 23071458 Paper 23019649 Paper 22960214 Paper 22911679 Paper 22604720 Paper 22582263 Paper 22511877 Paper 22457636 Paper 22456605 Paper 22253600 Paper 22211450 Paper 22072984 Paper 22072977 Paper 22048313 Paper 22022285 Paper 21940856 Paper 21935354 Paper 21917140 Paper 21775991 Paper 21753830 Paper 21738600 Paper 21730125 Paper 21663684 Paper 21587300 Paper 21493780 Paper 21383195 Paper 21196524 Paper 20981092 Paper 20890277 Paper 20876616 Paper 20690817 Paper 20663224 Paper 20660644 Paper 20595611 Paper 20579625 Paper 20558595 Paper 20552648 Paper 20466090 Paper 20433726 Paper 20382834 Paper 20148029 Paper 20067940 Paper 20010809 Paper 19851460 Paper 19815762 Paper 19812666 Paper 19713493 Paper 19713326 Paper 19662163 Paper 19279335 Paper 19255370 Paper 19087964 Paper 19087958 Paper 18987735 Paper 18922762 Paper 18670650 Paper 18516229 Paper 18411405 Paper 18288194 Paper 17989245 Paper 17943193 Paper 17542651 Paper 17435250 Paper 17431170

Abstract

DESCRIPTION (provided by applicant): Technological innovations arising from the HapMap Project have dramatically increased the speed and accuracy of genotyping while greatly reducing cost. Public and private efforts are beginning to release an unprecedented volume of human genotype and DNA sequence data into the public domain. In order to allow the best inferences about human variation and past human evolution from these data, we propose a series of investigations that center around four aims. First, we will develop novel statistical methods for population genetic inference from high-throughput DNA sequencing platforms. Pyrosequencing technology will generate assembled alignments that represent a sampling of sequence reads across individuals (multinomial) and across homologous chromosomes within an individual (binomial), producing a complex mixture. Inference of population genetic parameters from such data will demand novel statistical approaches, and we outline a set of plans to develop statistically rigorous methods. Second, we will develop methods for reverse-engineer the ascertainment biases of SNPs on widely used genotyping panels so as to enable population genetic inference. SNPs on the high-throughput genotyping platforms of Affymetrix and Illumina were ascertained in diverse and often irretrievable ways. Statistically sound population genetic inference from these data requires an understanding of the nature of the ascertainment bias of these platforms. We will reverse engineer the ascertainment by use of ENCODE and other dense resequence data, and use these inferences to perform ascertainment bias correction to high- density SNP platform data. Third, we will develop novel methods for inference of natural selection from patterns of haplotype diversity within and among human populations and apply these approaches to publicly available data sets. Methods of inference of natural selection from SNP frequency and haplotype diversity continue to gain in power and specificity. Optimization of these methods demands correction for effects of ascertainment, demographic effects, local variation in recombination, and for imputation of missing data and of haplotype phase. We will make use of Markov-Hidden Markov models for jointly estimating the magnitude, location, and age of selection sweeps. Finally, we will develop novel approaches for predicting the functional consequences of nucleotide substitutions in putatively functional regions of the human genome. Whole-genome association tests will gain power and specificity from the use of prior inference of the likelihood that a SNP has a damaging effect on a gene's function. In addition, after genome-wide association tests, there will follow extensive resequencing of candidate regions, and inference of the likelihood of deleterious effects of the many rare variants will also have utility. We propose methods that have advantages over existing approaches, making use of comparative genomic data, protein structure, cis-regulatory information, and patterns of segregating variation. Project Narrative: This project will develop methods of statistical inference from human DNA resequencing and SNP genotype data that will allow accurate estimation of critical parameters that describe the structure of variation in human populations. These inferences can provide vital clues to identifying genes that are associated with risk of complex genetic disorders.

View original record on NIH RePORTER →