Design and analysis of adaptive multistage genetic association studies

$298,750R01FY2009HGNIH

Virginia Commonwealth University, Richmond VA

Investigators

Linked publications & trials

Paper 23894747 Paper 23452721 Paper 23398781 Paper 21872442 Paper 20195266 Paper 20052610 Paper 19875103 Paper 19721433 Paper 19420056

Abstract

DESCRIPTION (provided by applicant): It has recently become possible to screen many genetic markers across the whole genome for their association with a disease. These genome-wide association studies (GWAS) offer great promise to identify common disease-predisposing variants. The goal of this project is to develop a flexible framework for designing cost-effective GWAS and optimize subsequent replication efforts. For this purpose we will use a general framework for designing optimal multistage studies. In multistage designs all the markers are genotyped and tested in a first stage. Only the promising markers are subsequently genotyped in a second stage using additional samples. Our approach offers three broad advantages. First, because of the large sample sizes that are required to discover disease-predisposing variants while controlling false discoveries, GWAS cost millions of dollars. Compared to single-stage GWAS, optimized multistage designs can achieve the same goals in terms of true and false discoveries with a 50-70% saving in the amount of genotyping. Second, single-stage designs are entirely based on assumptions that may be incorrect potentially leading to goals not being achieved or goals which could have been achieved at much lower costs. Multistage designs, however, offer the possibility to use information collected at the first stage(s) to design optimal follow-up studies. The trend to release GWAS data in the public domain will further increase the practical relevance of this adaptive feature of multistage designs because many research groups are likely to start performing replication studies in their own samples after GWAS data are publicly released. Third, rather than using arbitrary rules (e.g. P-values smaller than 0.05 suggest a replication) our framework will provide statistically motivated decision rules for declaring significance and the subsequent interpretation of what consitues a replication . Specific aims of our proposal include evaluating and improving the basic framework we already developed. To make the approach applicable across a wide variety of research scenarios, we also propose a wide variety of theoretical and computational extensions. To ensure the utility in practice, we will test our methods on real data. Finally, we plan to make the computer implementation available to a broad spectrum of researchers. Genome-wide association studies offer great promise to identify common disease- predisposing variants. The goal of this project is to develop a flexible framework for designing these studies in a cost-effective way and optimize subsequent replication efforts.

View original record on NIH RePORTER →