Development of Celera Whole Genome Shotgun Assembler
J. Craig Venter Institute, Inc., La Jolla CA
Investigators
Linked publications & trials
Abstract
[unreadable] DESCRIPTION (provided by applicant): The goal of this grant is to advance the genomic community's ability to accurately, easily and rapidly assemble genome sequences from shotgun sequencing data. Specifically, we plan to enhance and maintain a leading shotgun fragment assembly software program, the Celera Assembler. Shotgun DNA sequence data continues to be produced faster than it can accurately be assembled to determine genomic sequence. This problem will only be compounded as new sequencing machines, capable of producing large volumes of sequence data at low cost, come into active use. The sophistication of assembly software continues to be challenged to keep pace. The Celera Assembler was released into the public domain in 2004, and is managed as an open source project via the Sourceforge repository. The quality and accuracy of assembly software has a direct impact on the cost of genomic sequencing projects and genome closure, and the accuracy of the resulting genome sequence has a direct impact on all of the health related research that utilizes the sequence. We have assembled an exceptional team comprising a large portion of the original development team to improve and maintain the Celera Assembler, and support its user base. The principal investigator was the co-leader of the Celera Assembler development at Celera and the three co-investigators all made significant contributions to the algorithms and code base. This team has demonstrated that enhancements to the Celera Assembler could significantly improve the quality of genome assemblies (11, 41). This grant will allow us to make the Celera Assembler more user friendly, robust, capable of generating higher quality assemblies, and incorporating data from new types of sequencers. Towards this end, we will simplify and improve the algorithms and code, develop or incorporate analysis tools to assess the quality of assemblies, test the code on multiple computer platforms, debug the code on numerous organism assemblies, and develop a set of challenging benchmark assembly problems based on real data for use in rigorous regression testing to validate improved results using improved algorithms. All algorithmic improvements will be published in the scientific literature and documented in the code base. The entire code base and supporting analysis, benchmark and regression tools will be maintained as an open source project. [unreadable] [unreadable] [unreadable]
View original record on NIH RePORTER →