Genomic Data Science Core

$349,251P20FY2024GMNIH

Dartmouth College, Hanover NH

Investigators

Linked publications & trials

Paper 39630869 Paper 39558036 Paper 39443665 Paper 39437760 Paper 39386631 Paper 39367648 Paper 39345531 Paper 39273048 Paper 39091790 Paper 39041911 Paper 39028368 Paper 38918792 Paper 38909241 Paper 38872103 Paper 38839374 Paper 38699246 Paper 38698086 Paper 38688903 Paper 38688897 Paper 38683883 Paper 38664753 Paper 38593809 Paper 38585810 Paper 38473207 Paper 38463039 Paper 38444174 Paper 38412523 Paper 38356412 Paper 38328160 Paper 38279041 Paper 38206988 Paper 38172524 Paper 38160301 Paper 38160300 Paper 38146185 Paper 38106230 Paper 38076825 Paper 38014029 Paper 37864429 Paper 37810104 Paper 37759464 Paper 37745797 Paper 37736781 Paper 37603589 Paper 37577680 Paper 37498173 Paper 37439680 Paper 37414528 Paper 37377320 Paper 37358142 Paper 37341477 Paper 37333096 Paper 37260292 Paper 37220856 Paper 37131809 Paper 37114077 Paper 37089874 Paper 37066315 Paper 37037284 Paper 36993590 Paper 36946983 Paper 36909536 Paper 36879097 Paper 36865335 Paper 36865186 Paper 36859337 Paper 36824707 Paper 36728432 Paper 36700236 Paper 36661299 Paper 36603054 Paper 36260656 Paper 36214642 Paper 36168291 Paper 36152945 Paper 36028292 Paper 36018850 Paper 35946522 Paper 35921406 Paper 35820705 Paper 35696724 Paper 35693984 Paper 35584140 Paper 35572643 Paper 35502894 Paper 35450022 Paper 35382559 Paper 35140201 Paper 35018295 Paper 34890149 Paper 34544277 Paper 34404841 Paper 34390574 Paper 34143767 Paper 33889174 Paper 33870136 Paper 33763292 Paper 33654308 Paper 33363444 Paper 33208194

Abstract

RESEARCH CORE: GENOMIC DATA SCIENCE CORE SPECIFIC AIMS. The next-generation sequencing (NGS) revolution has generated massive amounts of new data that have transformed the field of genomics. Furthermore, development of new and existing genomics technologies continues to increase throughput of existing platforms while also giving rise to novel data types that measure an ever-growing list of genomic modalities1. Bioinformatic, computational, and statistical analysis approaches are critical for extraction of meaningful biological insights from the highly dimensional datasets generated by NGS technologies. Application of complex bioinformatics approaches has played a central role in recent scientific milestones, such as the completion and closure of the entire human genome sequence2, and rapid assembly of the Sars-CoV-2 genome during the COVID-19 pandemic3. Interdisciplinary frameworks that forge collaboration between quantitative and experimental researchers are required to maintain continued discovery in the genomic era. The field of single-cell genomics has recently seen intense and rapid development of novel technologies that have provided deeper insights into a vast array of biological processes4. These technologies have continued to increase not only the number of cells examined in a single experiment but also the number of genomic modalities that can be measured simultaneously. For example, integration of genomic and microscopic technologies has spawned the field of spatial transcriptomics, which enables spatial analysis of genome-wide gene expression at single cell-resolution5. However, the promise of these technologies requires the concurrent development of computational methodologies that can draw robust and efficient insights from these unique data. Development of novel approaches for specific single-cell applications is an active area of research, with a constant stream of new methods becoming available to the research community. However, leveraging these methods to make relevant insights requires teams of bioinformaticians, computational biologists, and quantitative methodologists who have diverse interdisciplinary backgrounds in genomics, statistics, data science, and computing. In Phase 1, the Data Analytics Core (renamed herein for Phase 2 as the Genomics Data Sciences Core, GDSC) developed a dynamic and interactive core facility that met the unique analytical needs of its wide user base. In Phase 2, we will build on the established services from Phase 1 to serve the new research project leads (RPLs) as well as those of the wider Dartmouth research community. Specifically, we will develop and incorporate analysis pipelines for spatial transcriptomics into our analysis portfolio to support the investment in cutting-edge instrumentation made by the Single-Cell Genomics Core (SCGC). In addition, we will continue to innovate and incorporate data analysis solutions for other emerging genomics technologies such as long-read sequencing applications. Furthermore, we will build on our series of nationally recognized online genomic data science workshops, experiences that train participants in fundamental concepts of practical genomic data analysis, to include analysis of single-cell transcriptomics data. Collectively, these efforts will allow us to support our extensive user base amassed in Phase 1, while further developing a fee structure that prepares us for future sustainability into Phase 3 and beyond. These services harmonize with the analytic needs of new project leads, the larger Dartmouth and IDeA communities, and the ever-expanding technologies available in the SCGC. Specific Aim 1. To support the development and implementation of novel approaches for analysis of -omics data. We will leverage our experience applying cutting-edge genomic data analysis techniques to support the development and implementation of novel analysis methods to analyze bulk, single-cell, and spatial genomics research at Dartmouth and elsewhere. GDSC personnel will stay abreast of the latest advances in analytical developments to facilitate implementation of such state-of-the-art methodologies. Where relevant, we will also develop resources and training materials that facilitate the dissemination and utilization of new analysis methods and genomics datasets created by project leads. Specific Aim 2. To support cluster computing, pipeline development, and access to biological databases for COBRE Center Project Leaders, mentors, Dartmouth, and the wider IDeA community. We will further develop computational resources that support cluster computing for multidisciplinary genomics research among a wide range of investigators at Dartmouth. These resources will include the development and maintenance of existing and novel standardized data analysis pipelines, aggregation, and hosting of valuable genomics reference datasets on the Dartmouth computing infrastructure, as well as development of new educational materials for single-cell and spatial transcriptomics data analysis. This core will continue to serve all the proposed COBRE research projects while remaining highly integrated with the SCGC. Such integration will be critical in Phase 2 to ensure that experimental design meets requirements for analytical procedures; this is critical given the complexity of both the novel instrumentation and experimental designs of the proposed projects. Combined with our expanding educational resources, GDSC will have a substantial and durable positive impact on the quality of biomedical research at Dartmouth and beyond.

View original record on NIH RePORTER →