Computer Cluster and Storage to Support Whole Genome Sequencing and Analysis
University Of California Los Angeles, Los Angeles CA
Investigators
Linked publications & trials
Abstract
DESCRIPTION (provided by applicant): A new era of genetic research is now upon us with the arrival of massively parallel, high throughput, whole genome DNA sequencers. These new instruments allow avenues of genetic inquiry for advanced scientific laboratories that were impossible until now due to the prohibitive time and cost of large-scale sequencing. Local labs can now consider studies that were previously the purview of national labs. Whole genome technology is quickly becoming crucial to labs that wish to be current in genetic research. As befits a world- class center of genetics, UCLA currently has three whole genome sequencers in Core facilities available to its scientists, and a fourth is being sought in a separate proposal. At UCLA there are currently two Illumina Genome Analyzers (aka Solexa) and one Roche Genome Sequencer FLX (aka 454);an Applied Biosystems SOLiD machine is the possible fourth platform. In addition, UCLA has genome-wide, high throughput genotyping available using the Illumina BeadLab 1000. Unfortunately there is a large hidden cost of this new technology beyond the cost of the whole genome DNA sequencers and genotypers themselves. The amount of genetic data generated by these machines is roughly 1000-fold greater than was available only two years ago at the same cost per run. Not only is there need for tremendous storage capacity, but it also requires considerable computational power to thoroughly analyze this quantity of data. The current instrumentation grant requests funds for a storage array and computational cluster to dedicate to whole-genome sequencing and analysis at UCLA. The storage array will have 84 TB total capacity;the computational cluster will have 128 Intel Xeon 3 GHz cores. These are enterprise-level instruments that are now available at reasonable prices. These instruments will be run by the Human Genetics Bioinformatics Core, which has several years experience with these type of computing systems. There are a number of ongoing research projects at UCLA that can immediately benefit from this technology. There is also a large community of scientists engaged in biomedical research for whom this technology will be beneficial in the very near future. As a particular example, UCLA is generating and analyzing the data in an NIH-funded genome-wide search for schizophrenia genes. In addition, two of the projects outlined in this proposal are part of the Center for Rapid Influenza Surveillance and Research investigating the distribution and transmission risks of avian influenza. These critical public health issues have an urgent need for the requested instruments. There is a great deal of excitement for whole genome sequencing and analysis within the UCLA scientific community, they are anxious to use the proposed equipment to move biomedical research at UCLA into the next era of genomic discovery. PUBLIC HEALTH RELEVANCE: New massively parallel DNA sequencing technology allows whole genomes to be sequenced at a fraction of the time and cost previously possible. This new technology generates tremendous amounts of data that requires large storage capacity and computing power to store and analyze. Bringing this DNA sequencing technology, and the power to analyze the resulting data, to the UCLA biomedical research community will allow scientists to develop a deeper insight into the structure and function of the genome, and advance the understanding, diagnosis, and treatment of public health issues.
View original record on NIH RePORTER →