A Turnkey Solution for Next Generation Sequence Data Analysis
Emory University, Atlanta GA
Investigators
Linked publications & trials
Abstract
DESCRIPTION (provided by applicant): Modern biology continues to be revolutionized by high throughput data production technologies. Nowhere is this more obvious than in the case of "next-generation" DNA sequencing technologies, which have dramatically higher throughput and lower cost then previous approaches. Not only do these technologies make genome sequencing and resequencing more widely available, they have driven the development of a variety of novel genome-wide (and data-intensive) functional assays. But are these methods really accessible for experimental- ists? Although the financial cost of sequencing has been substantially reduced, there is still a significant barrier preventing experimental biologists from making effective use of this data. Translating the data generated by these new technologies requires sophisticated computational infrastructure - both for data large-scale data management and analysis - that is accessible to experimentalists. Genomic data discovery is no longer the limiting factor for much genomic research, instead the problem lies in providing the data, analysis tools, and protocols in a form that is usable for bench biologists, so that they can take full advantage of their data. We have developed a framework - Galaxy - that makes it easy to provide accessible interfaces to computational tools, and provides experimental biologists with an intuitive and consistent interface for per- forming sophisticated analyses with minimal effort, regardless of the scale of data involved. Here we propose to build, using this existing framework, a complete "turnkey" solution for accessible management and analysis of next-generation sequence data. This solution will allow data produced by sequencing instruments to be automatically made available to bench biologists through Galaxy's user-friendly analysis environment. Into this environment we will integrate a large set of tools for sequence data analysis, along with pre-defined best- practice "workflows" for common analysis problems. The entire solution will be provided as a pre-configured ready-to-run package which any lab or provider of sequencing services can easily deploy, enabling their users to truly realize the promise of next-generation sequencing technologies. PUBLIC HEALTH RELEVANCE: A new generation of high-throughput DNA sequencing technologies has made a variety of novel data-intensive genome-scale experiments both possible and relatively inexpensive, putting these techniques within the reach of many more labs. However, these dramatic improvements in the availability and cost of sequencing have not yet been matched with easy-to-use, scalable, integrated and flexible data analysis capabilities. The proposed project will develop an integrated data management and analysis solution that allows biomedical researchers to easily and efficiently work with the data produced by these revolutionary new technologies.
View original record on NIH RePORTER →