Generating the next generation of genomics resources for biomedical investigation using a powerful and cost-effective genome assembly strategy

$287,912U24FY2017HGNIH

University Of California Santa Cruz, Santa Cruz CA

Investigators

Linked publications & trials

Paper 33335035 Paper 30485757 Paper 29914971 Paper 29092041

Abstract

? DESCRIPTION (provided by applicant) Unravelling the genetic basis of human health and disease requires high-quality genome information. Heretofore, the community has relied on a single, haploid human reference sequence that does not represent the actual genome sequence of any single person. While this resource has been enormously beneficial for mapping many of the genetic variants responsible for human normal and disease variation, it can limit many types of analyses. Further, this single-reference model can cause uneven power to analyze genetic variation across all human populations. Similarly, non-human primate genomes are also fundamentally important for understanding our own genome. While many draft primate genome assemblies are available, including for all great apes, these genomes are all of lower quality and contiguity than the human reference genome. Importantly, chromosome-scale scaffolding of these genomes was often done by comparison to the human reference. While this approximation is generally correct, knowing where this is wrong is critically important. Using a radically innovative and simple approach, we can now generate highly contiguous de novo assemblies of human and non-human primate genomes. The approach requires sub- microgram quantities of DNA and can be carried done from start to finish within a few months, including sequencing time. Our approach uses genome contiguity information as derived from proximity ligation of in vitro assembled chromatin. It harnesses the speed and cost-effectiveness of high-throughput sequencing to generate large amounts of haplotype-phased contiguity data spanning well over 100 kilobases in length. Using this approach we will generate de novo assembled genomes from 50 humans and 12 non-human primates of high accuracy, partially haplotype phased, with scaffold N50s expected to be between 10 and 20 Mb in length.

View original record on NIH RePORTER →