GGrantIndex
← Search

Statistical Methods For Gene/environment Interaction And Genetic Susceptibility

$35,337ZIAFY2019ESNIH

National Institute Of Environmental Health Sciences

Investigators

Linked publications, trials & patents

Abstract

Identification of causative SNPs in a genome-wide study can be challenging when individual SNPs have small marginal effects because testing thresholds must reflect the large number of SNPs under study. For complex diseases, particular combinations of SNPs may dramatically increase risk a kind of epistasis or gene-gene interaction. We are currently investigating the use of a machine learning technique for the discovery of sets of SNPs that together cause disease (causative SNPs) in case-parents data. First, we devised a way to use actual case-parent triad genotypes to create simulated genome-wide data sets that reflect realistic linkage disequilibrium structure and are seeded with known sets of causative SNPs. This manuscript was recently published, and the computer code is publicly available. We are currently working to better characterize the genetic properties of populations simulated in this way. Second, we implemented an existing stochastic search algorithm (called GA-KNN) that is based on an evolutionary algorithm to find multiple sets of k SNPs that are predictive of disease (here k is a small number, say 2 or 4). By cataloguing those SNPs which appear most frequently among the sets that are predictive of disease, we hope to uncover the sets of causative SNPS. In preliminary trials on simulated data seeded with two interacting sets of four SNPs each, our approach shows promise. In ongoing work, we are attempting to speed up the algorithm and to see whether the promising performance is maintained in more complex situations. (see also Z01 ES040007; PI Clare Weinberg; Min Shi is also a within-lab collaborator on this project; her time is allocated in Weinberg's project but not in this one.)

View original record on NIH RePORTER →