Genomic Data Science Core
Dartmouth College, Hanover NH
Investigators
Linked publications & trials
Abstract
RESEARCH CORE: GENOMIC DATA SCIENCE CORE SPECIFIC AIMS. The next-generation sequencing (NGS) revolution has generated massive amounts of new data that have transformed the field of genomics. Furthermore, development of new and existing genomics technologies continues to increase throughput of existing platforms while also giving rise to novel data types that measure an ever-growing list of genomic modalities1. Bioinformatic, computational, and statistical analysis approaches are critical for extraction of meaningful biological insights from the highly dimensional datasets generated by NGS technologies. Application of complex bioinformatics approaches has played a central role in recent scientific milestones, such as the completion and closure of the entire human genome sequence2, and rapid assembly of the Sars-CoV-2 genome during the COVID-19 pandemic3. Interdisciplinary frameworks that forge collaboration between quantitative and experimental researchers are required to maintain continued discovery in the genomic era. The field of single-cell genomics has recently seen intense and rapid development of novel technologies that have provided deeper insights into a vast array of biological processes4. These technologies have continued to increase not only the number of cells examined in a single experiment but also the number of genomic modalities that can be measured simultaneously. For example, integration of genomic and microscopic technologies has spawned the field of spatial transcriptomics, which enables spatial analysis of genome-wide gene expression at single cell-resolution5. However, the promise of these technologies requires the concurrent development of computational methodologies that can draw robust and efficient insights from these unique data. Development of novel approaches for specific single-cell applications is an active area of research, with a constant stream of new methods becoming available to the research community. However, leveraging these methods to make relevant insights requires teams of bioinformaticians, computational biologists, and quantitative methodologists who have diverse interdisciplinary backgrounds in genomics, statistics, data science, and computing. In Phase 1, the Data Analytics Core (renamed herein for Phase 2 as the Genomics Data Sciences Core, GDSC) developed a dynamic and interactive core facility that met the unique analytical needs of its wide user base. In Phase 2, we will build on the established services from Phase 1 to serve the new research project leads (RPLs) as well as those of the wider Dartmouth research community. Specifically, we will develop and incorporate analysis pipelines for spatial transcriptomics into our analysis portfolio to support the investment in cutting-edge instrumentation made by the Single-Cell Genomics Core (SCGC). In addition, we will continue to innovate and incorporate data analysis solutions for other emerging genomics technologies such as long-read sequencing applications. Furthermore, we will build on our series of nationally recognized online genomic data science workshops, experiences that train participants in fundamental concepts of practical genomic data analysis, to include analysis of single-cell transcriptomics data. Collectively, these efforts will allow us to support our extensive user base amassed in Phase 1, while further developing a fee structure that prepares us for future sustainability into Phase 3 and beyond. These services harmonize with the analytic needs of new project leads, the larger Dartmouth and IDeA communities, and the ever-expanding technologies available in the SCGC. Specific Aim 1. To support the development and implementation of novel approaches for analysis of -omics data. We will leverage our experience applying cutting-edge genomic data analysis techniques to support the development and implementation of novel analysis methods to analyze bulk, single-cell, and spatial genomics research at Dartmouth and elsewhere. GDSC personnel will stay abreast of the latest advances in analytical developments to facilitate implementation of such state-of-the-art methodologies. Where relevant, we will also develop resources and training materials that facilitate the dissemination and utilization of new analysis methods and genomics datasets created by project leads. Specific Aim 2. To support cluster computing, pipeline development, and access to biological databases for COBRE Center Project Leaders, mentors, Dartmouth, and the wider IDeA community. We will further develop computational resources that support cluster computing for multidisciplinary genomics research among a wide range of investigators at Dartmouth. These resources will include the development and maintenance of existing and novel standardized data analysis pipelines, aggregation, and hosting of valuable genomics reference datasets on the Dartmouth computing infrastructure, as well as development of new educational materials for single-cell and spatial transcriptomics data analysis. This core will continue to serve all the proposed COBRE research projects while remaining highly integrated with the SCGC. Such integration will be critical in Phase 2 to ensure that experimental design meets requirements for analytical procedures; this is critical given the complexity of both the novel instrumentation and experimental designs of the proposed projects. Combined with our expanding educational resources, GDSC will have a substantial and durable positive impact on the quality of biomedical research at Dartmouth and beyond.
View original record on NIH RePORTER →