NHGRI/DIR Bioinformatics and Scientific Programming Core
National Human Genome Research Institute
Investigators
Abstract
The Bioinformatics and Scientific Programming Core actively supports the research being performed by NHGRI/DIR investigators by providing expertise and assistance in scientific programming and computational analysis. The Core facilitates access to specialized software and hardware, develops generalized software solutions that can address a variety of questions in genomic research, develops database solutions for the efficient archiving and retrieval of experimental and clinical data, disseminates new software and database solutions to the genome community at-large, collaborates with DIR researchers on computationally intensive projects, and provides educational opportunities in bioinformatics to trainees. Support for projects includes not only data analysis but also related efforts focused on data collection through the public DIR research web site, located at https://research.nhgri.nih.gov. Additional information can be found on the Cores web site, at https://dir.nhgri.nih.gov/nhgri_cores/BSPC. Projects performed during the reporting period include: - the implementation of a genome data portal for the preliminary annotation and analysis of the Hydra vulgaris AEP genome, - providing bioinformatic support for The Genomic Ascertainment Cohort (TGAC), including implementation of the gnomAD variant browser, for establishing a shared genomic ascertainment cohort of at least 10,000 individuals whose genomes or exomes have been sequenced and are recallable for secondary phenotyping studies, - maintaining a variant co-occurrence pipeline to identify TGAC individuals with two variants occurring on the same haplotype, - synthesizing variant data from five cohorts, consisting of 11,000 genomes and 3,800 exomes, to be uploaded into RPC Genomic Data Browser, - analyzing single-cell RNA-seq and ATAC-seq data obtained from zebrafish sensory hair cells, - implementing software to analyze data generated from the scSPRITE protocol, which allows for investigation of 3D genome arrangement at the single cell level - web site development for the Cohort Analytics Core (CAC) and Reverse Phenotyping Core (RPC) - providing support for the FUSION Project, investigating the genetic basis of type 2 diabetes disease risk through the use of single-cell and/or single-nuclei RNA-seq technology, with the goal of interrogating the transcriptomes of the individual cells that comprise the pancreatic islet. Differences in the single-cell transcriptomic signatures and cell type composition of islets obtained from diabetic and non-diabetic patients will be assessed - updating the Skippy web server to comply with security regulations, as well as including additional complementary tools for splicing prediction, - performing isoform expression profiling of pan-cancer datasets in TCGA, analyzing methylation marks in parents caring for children with inherited metabolic disorders, performing platelet transcriptome analysis in patients with familial platelet disorder with associated myeloid malignancies, - determining differences in the genome-wide DNA methylation landscape between CBFB-MYH11 knockin mice that serve as a model for leukemia and wildtype controls, - analyzing EM-seq, ChIC-seq, and RNA-seq data to assess changes in genome-wide DNA methylation, RUNX1 binding, and gene expression in iPSCs derived from patients with RUNX1 mutations; - single-cell RNA-seq analyses in peripheral blood from patients with mitochondrial disease, - ATAC-seq, ChIP-seq and RNA-seq in effector and memory T-cells from wild type and pyruvate dehydrogenase deficient T-cells to examine the chromatin and transcriptional landscape, - eQTL analysis using RNA-seq data generated from PBMCs from mitochondrial disease patients and healthy volunteers, - exome sequencing of LCLs from mitochondrial disease patients, - ATAC-seq and RNA-seq of iPSC-derived neurons to identify changes in chromatin accessibility and gene expression due to mitochondrial dysfunction, - continuing maintenance of a customized database and web interface for storing and computing on genomic data from dogs, - design and implementation of surveys that assess the health of pet dogs whose DNA samples have been submitted to scientific studies, - RNA-seq analyses of post-mortem brain tissue to compare neuronal gene expression in youths with a history of ADHD against matched controls in order to establish a neuronal transcriptome and determine the genes and neural gene networks that influence the development of ADHD, - developing a website to facilitate the collection of sensitive medical data about genetic conditions, for classification and analysis utilizing AI methodologies; - ATAC-Seq and RNA-Seq to initiate understanding of the phosphate sensing mechanism in animal bone cells, - identifying integration sites of AAV in mouse and human genomes and developing methods to characterize the clustering and locations of the integration sites - identification of retroviral integration sites from RNA-seq data, - annotation of immunoglobulin superfamily genes in the histocompatibility complex of Hydractinia, variant calling aimed at mapping sex determination loci and producing fine-scale mapping data for the genomic region controlling histocompatibility in Hydractinia, - implementation of a gene prediction pipeline and genome data portal for the preliminary annotation and analysis of the Hydractinia genome, - implementation of a genome data portal for the Capitella teleta genome, - developing a proteome-scale dataset resource of computationally predicted protein structures for Mnemiopsis leidyi, - analyzing RNA-seq data comparing mouse models of MMA versus healthy controls, and - updating the RPC variant browser to make it compliant with NIH security regulations and introduce new features. During the reporting period, the Core also supported Labmatrix, NHGRIs clinical research database. In addition to its general use by NHGRI clinical investigators, Labmatrix is utilized for large-scale data and/or sample management for the Inherited Diseases and Caregiving Study, the Insights Microbiome/Sickle Cell Study, the ClinSeq Study, and GENE-FORECAST. NHGRI Labmatrix Support services include user training and help desk support, legacy data mapping, data validation and import, and barcoding implementation. Support staff routinely handle large datasets, import data from CRIS, develop complex queries, and generate custom data reports on behalf of database users. Finally, recognizing the importance of having a degree of facility with computational approaches, the Core continues to offer a number of courses that cover various areas of the bioinformatic landscape.
View original record on NIH RePORTER →