Genome-wide characterization of complex variants and their phenotypic effects in African populations

$249,999U01FY2024HGNIH

Covenant University, Ota

Investigators

Jelili Olanrewaju Oyeladecontact Melissa Gymrek Daudi Jjingo

Linked publications, trials & patents

Abstract

PROJECT SUMMARY Advances in omics technology have the power to provide integrative models of disease risk and influence health outcomes. However, the utility of these models has so far been limited to non-African populations, due to biases in available datasets. Further, efforts to identify medically relevant genetic variants have included only a subset of known genetic variants and have had limited focus on phenotypes most relevant to Africa. Newly available genomic datasets from the African continent provide a rich opportunity to begin addressing this gap. Most large genomics efforts in both Africans and non-Africans have focused on single nucleotide polymorphisms (SNPs), excluding a large fraction of more complex and ancestry-specific variant types such as genomic repeats. Here, we consider multiple complex variant types, focusing on tandem repeats (TRs). TRs are well known to contribute to human disease. For example, large repeat expansions are implicated in Huntingtonâs Disease and other disorders, and stepwise variation in repeat copy number at TRs has been implicated in a variety of complex traits. Although their role in human phenotypes is well established, discovery efforts in repeat regions have been largely limited to datasets and phenotypes dominated by non-Africans. We hypothesize that detailed analysis of repeat variants in Africa will identify novel disease-associated loci including pathogenic repeat expansions, as well as improve the utility of risk prediction models, ultimately leading to improved diagnosis and health outcomes. Our proposal leverages existing and novel data analysis approaches to interrogate technically challenging repetitive regions and integrates diverse genomics datasets from across the African continent including (1) whole genome-sequencing (WGS) from more than 1,000 individuals, (2) SNP array data from more than 10,000 individuals, and (3) health outcome information related to trypanosomiasis, HIV status, chronic kidney disease, cancer risk, and cardiometabolic traits with high prevalence in African populations. We will further incorporate existing biobanks containing tens of thousands of diverse genomes (admixed Africans from All of Us and UK Biobank) to validate findings and improve power. The overall goal of this proposal is to improve health outcomes in Africa using innovative data analysis and machine learning techniques. Specifically, we will characterize genome-wide TR variation in African individuals (Aim 1), identify signals of positive and negative selection at these regions (Aim 2), and identify TRs associated with medically relevant phenotypes and generate improved ancestry specific polygenic risk scores (Aim 3). We bring together a diverse team spanning Africa (headed by MPIs Adebiyi and Jjingo) and the US (MPI Gymrek) which has already initiated a fruitful collaboration. Further, analyses will be performed primarily using existing African supercomputing infrastructure and led by new and early-stage African investigators and trainees. Overall, the proposed aims will likely identify novel medically relevant genetic variants and continue to foster data science capabilities within Africa.

View original record on NIH RePORTER →