Center for Alzheimer's and Related Dementias (CARD): Harmonized Data-Derived Resources for the Alzheimer's Disease and Related Dementias Community

$11,628,452ZIAFY2022AGNIH

National Institute On Aging

Investigators

Linked publications, trials & patents

Paper 39207075 Paper 39030740 Paper 38940474 Paper 38923692 Paper 38889728 Paper 38853922 Paper 38849413 Paper 38829682 Paper 38770829 Paper 38487808 Paper 38181731 Paper 38066663 Paper 37198259 Paper 36669485 Paper 35365675

Abstract

CRISPRBrain (https://crisprbrain.org/) is a major effort in the space of harmonized data for the analysis, interpretation, and discovery of edited cell lines, focusing on gene expression data with an upcoming expansion to include lipidomics and proteomics data. It is a field-leading platform with 8,489 unique accesses over the past year and dozens of weekly return users. The platform has an open application programming interface (API) allowing users to stream and analyze data. Additionally it facilitated insights into ferratopoesis dysfunction in neurodegeneration (Tian et al, 2021). The resource has also integrated novel CROPseq datasets facilitating the analysis and sharing of screens, providing insight into potentially draggable regulators of microglial function in neurodegenerative diseases (Drager et al., 2022). As a complement to CRISPRBrain, we are building a resource that will democratize network-based functional inferences in neurodegenerative diseases. This multi-omic functional inference engine is based on our recent publication showing multi-omic signatures of risk across neurodegenerative diseases at the GRN gene (Nalls et al., 2021). This involved harmonization and analysis of dozens of genome-wide QTL and GWAS resources as well as underlying participant-level genomics data from bulk RNA sequencing and ATAC-seq to single cell transcriptomics. Our goal is to have this as an easily browsable web resource by the end of 2022 (https://github.com/NIH-CARD/OmicSynth). GenoML is an open source automated machine learning tool used for the harmonization and analysis of genomics data (https://genoml.com/). This includes novel methods for alleviating classifier inaccuracy due to stochastic genetic variation in multi-modal datasets. Proof of concept work across multiple multi-modality datasets from the AMP-PD project has delivered deployable classifiers for the prediction of PD onset and has succeeded in democratizing complex machine and deep learning workflows (Makarious et al., 2022). We are using GenoML to build open source models for improved clinical trial recruitment in the ADRD space. Currently GenoML has been accessed by 5,460 unique users in the past year. A major collaborative project is underway with Konica Minolta and its Invicro subdivision to standardize and harmonize longitudinal imaging data from the UK Biobank and multiple neurodegenerative disease specific resources for hundreds of thousands of brain MRI images (https://invicro.com/invicro-dti-and-nih-bring-imaging-and-genomics-data-together/). Current early deliverables include machine learning derived maps of brain atrophy and the integration of these outcomes with genomics data for publicly shared code, data and manuscripts. CARD has curated all currently available public domain AD/ADRD genetics data and also more deep molecular data. To make all of this data easily discoverable across repositories such as local NIH (Biowulf cluster) and commercial cloud resources (Terra.bio and the Alzheimers Disease Data Workbench), we have built a lightweight tool to locate data we have curated based on existing RedCap infrastructure in place at NIA. We are building an easy to use and low cost tool called the Data file Inventory and Verification Environment for Research (DIVER). DIVER interfaces with the National Library of Medicines common data elements (CDEs) library to aid in harmonization of AD/ADRD relevant data (https://cde.nlm.nih.gov/home). DIVER also aids in harmonization projects underway as part of collaborations with the University of Mississippi Medical Center on harmonizing extant studies of cognitive aging at NIA such as the Baltimore Longitudinal Study of Aging and the Health Aging and Body Composition Study. In particular the CDEs curated for DIVER have allowed us to accelerate the research of collaborators at the UK Dementias Research Institute to facilitate early work on automated metadata harmonization across global repositories. We will leverage current gold standard tools after an internal systematic review of similar public offerings. Curated and harmonized data including deep molecular data from other projects, and tools have been shared appropriately to GitHub, Terra.bio and the Alzheimers Disease Data Workbench. As part of this data harmonization strategy, we aimed to facilitate biobank scale collaborations by standardizing electronic medical record codes for both the UK Biobank, Finnish Biobank and AllOfUs Study with special attention paid to AD/ADRD relevant data. We are currently beginning collaborations to accomplish similar harmonization and analysis efforts with the Welsh Biobank in Cardiff (SAIL). We have analyzed viral exposures associated with risk of neurodegeneration up to 15 years prior to disease manifestation. We identified and replicated 22 novel pairs of viruses and neurodegenerative diseases in over 500,000 biobank samples and replicated the previous association between Epstein-Barr exposure and multiple sclerosis (Levine et al., 2022). Longitudinal data harmonization and analysis poses a unique set of challenges. We have built a democratized and easily deployable longitudinal data analysis pipeline tailored for genomics data. We are currently expanding its functionality and usability to identify AD/ADRD related imaging and CSF biomarker associations with genetics to provide insights into the genetics of disease progression (https://longitudinal-gwas-pipeline.readthedocs.io/en/latest/). Some proofs of concept for this pipeline include evaluations of cognitive decline in Parkinson's datasets, as well as mortality and depressive symptom studies (Tan et al., 2020). In parallel to our work on genetic clustering across diseases mentioned above, we have also utilized harmonized clinical and genomic data to identify progression phenotypes in ALS/FTD and PD, with Lewy body dementia and AD underway (Faghri et al., 2022). From data management and discoverability, to aggregation and harmonization, CARD Advanced Analytics proof of concept for this aspect of our scope of work is our current multi-ancestry analysis of Alzheimers disease genetic risk. This project accurately quantifies risk heterogeneity across diverse continental ancestries, evaluates risk prediction generalizability and discovers two novel risk loci while leveraging genetic diversity to fine map genetic risk at nine loci. Finally, making harmonized datasets, tools and web resources is only useful if the research community can actually use them. CARD Advanced Analytics has been working with external collaborators and CARDs own newly formed Training Team to support hackathons, office hours and one-on-one interactions with members of the research community from a variety of backgrounds to not only show them the resources available to them but also to understand and use these resources efficiently. It is our goal to help democratize complex data science research in the biomedical space at CARD and understand the needs of the research community we are part of. Several preprints have also resulted from this work: 1. Dadu, A. et al. Identification and prediction of Parkinsons disease subtypes and progression using machine learning in two cohorts. bioRxiv 2022.08.04.502846 (2022) doi:10.1101/2022.08.04.502846. 2. Levine, K. et al. Virus exposure and neurodegenerative disease risk across national biobanks. medRxiv 2022.07.08.22277373 (2022). 3. Tan, M. M. X. et al. Genome-wide determinants of mortality and clinical progression in Parkinsons disease. doi:10.1101/2022.07.07.22277297.

View original record on NIH RePORTER →