Data Science and Sharing Team
National Institute Of Mental Health
Investigators
Linked publications & trials
Abstract
Data Sharing In January of 2023, the NIH implemented a new data sharing policy to promote scientific data sharing. Our team has taken a lead role in advising researchers on how to prepare their data management and sharing plans (DMSP) and organizing their datasets to be shared to public repositories. In July of 2025, the DSST held multiple weeks of office hours to aid groups with new DMSP plan compliance reporting requirements. In collaboration with the NIMH Clinical Directorâs Office, the DSST prepared an update to the second data release for the Healthy Research Volunteers Protocol (NCT03304665) dataset. This update includes additional Arterial Spin-Labelled (ASL) data from 234 participants and can be found on the OpenNeuro repository (https://openneuro.org/datasets/ds004215). The DSST worked to organize and upload two datasets from Chris Bakerâs group to the OpenNeuro repository. The first dataset consists of anatomical Magnetic Resonance (MR) images and Magnetoencephalography (MEG) data collected while 18 participants viewed static and dynamic faces. This dataset will be made public upon publication of the corresponding paper. The second dataset consists of extensive functional and structural MR image while subjects view low-level visual stimuli (https://openneuro.org/datasets/ds005521). In collaboration with Simone Haller and Daniel Pineâs group, the DSST assisted in preparing eight datasets to be shared. The first two datasets consist of structural and functional MRI, neurocognitive, clinical, and demographic data from 149 and 124 participants, respectively. These datasets have been uploaded to OpenNeuro (https://openneuro.org/datasets/ds006303 and https://openneuro.org/datasets/ds005754). Five of the datasets contain structural and functional MRI, neurocognitive, clinical, and demographic data and are in various states of preparation. The remaining dataset examines relationships between sleep and irritability within a pediatric sample and is publicly accessible on the Open Science Foundation (OSF) repository (https://osf.io/8nt32). In collaboration with Karen Berman and Peter Schmidtâs groups, we prepared and shared a dataset that includes structural MRI data from 24 eight-year-old participants (https://openneuro.org/datasets/ds006267). In collaboration with Peter Schmidtâs group, we are in the process of curating and sharing a behavioral and clinical dataset of participants with and without premenstrual dysphoric disorder that will soon be available on the OSF repository (https://osf.io/bvdqe). In collaboration with Carlos Zarateâs group, the DSST is assisting with the preparation and sharing of a dataset that consists of five sessions from 58 participants of pre- and post-ketamine treatment for patients with major depressive disorder (https://openneuro.org/datasets/ds005917). Finally, in collaboration with Ted Usdinâs Systems Neuroscience Imaging Resource, the DSST assisted in curating and sharing a dataset that consists of deconvolved lightsheet microscopy images and analysis to the DANDI Archive (https://dandiarchive.org/dandiset/001362). Data Curation DSST continues to provide the IRP with access to multiple large and publicly available datasets. We maintain a comprehensive list of these datasets on our website (http://cmn.nimh.nih.gov/dsst). To date, we maintain over 125,000 MRI scan sessions across 33 different datasets. This year our most requested datasets were the UK Biobank and the Adolescent Brain Cognitive Development (ABCD) study. Two new datasets were added to our collection by request. First, Stefano Marenco of the Human Brain Collection Core requested a dataset from the NeMO repository which indexes inter-individual variation in human cortical cell type abundance and expression. Second, Daniel Glen of the Statistical and Scientific Computing Core requested a dataset that includes structural MR images from healthy volunteers and patients with spinocerebellar ataxia to validate a novel cerebellar atlas. Training The DSST is continually providing ad hoc training while consulting with researchers and trainees throughout the NIH intramural program. This year, the DSST held weekly informal âLunch and Learnâ discussions and presentations on various aspects of open science, computer science, and general topics relevant to the NIMH community. Mia Zwally, a postbac trainee on our team, completed analysis on a project examining the relationship between brain function and cognition. Her aim was to replicate previous work within a more heterogenous developmental sample while also extending previous work from our group. The manuscript is currently in preparation and a pre-registration of her methods can be found on OSF (https://osf.io/t9dhk). Josh Lawrimore and other DSST members have provided multiple ad hoc support and trainings on the tools of open science, including assisting Drs. Bandettini, Raznahan, Tejedaâs, and Liâs groups on using Python for scientific coding and best practices in version control, reproducibility, and GitHub usage. In addition, DSSTâs newest member, Ashley Ptinis provided support to Carlos Zarateâs group on the tools used to convert raw imaging data into a standard format for data sharing. The DSST continues to support the Brain Imaging Data Structure (BIDS) standard and the community who use it. Eric Earl and Anthony Galassi serve on the BIDS Maintainers committee and Eric recently hosted the yearly BIDS Town Hall meeting to update the community on the project and receive feedback. Eric is a lead on the BIDS Extension Proposal BEP036 for Phenotypic Data Guidelines, which is nearing its final stage before integration into the next release of the BIDS specification (https://bids.neuroimaging.io/extensions/beps/bep_036.html). Collaborations and projects In our collaboration with Dr.Raznahanâs group, we continue to assist with processing structural brain images of over 60,000 subjects within the UK Biobank to examine complex relationships between genome and brain structure. The DSST is assisting with obtaining and organizing functional MRI derivatives and phenotypic data from almost 60k participants to examine sex differences in brain function. In an ongoing project in the group, Josh Lawrimore is investigating the state of data sharing in the scientific literature through identifying data sharing statements within the full text of manuscripts stored on PubMed Central. He is currently examining trends in data sharing as a function of funding agency. In collaboration with Dr.Tejedaâs Unit on Neuromodulation and Synaptic Integration, Josh Lawrimore has developed an API wrapper for the LabArchives Electronic Lab Notebook (ELN) and a series of user tools to access and interact with ELNs. This effort significantly simplifies previous workflows and allows for the automated tracking of data and metadata from fiber photometry studies. The DSST has engaged in two collaborations with the Machine Learning Team (MLT). First, Eric Earl and Dustin Moraczewski assisted with cleaning and formatting a behavioral dataset of participants with and without premenstrual dysphoric disorder from Peter Schmidtâs group in preparation for the MLT to model diagnostic classification. Second, Josh Lawrimore and Ashley Ptinis converted an R package written by the MLTâs Gabe Loewinger into a Python package, which is now publicly available to install from the PyPi package repository. This code reproduces the analysis of fiber photometry data from a previous study and can be found on GitHub (https://github.com/nimh-dsst/fast-fmm-rpy2). In collaboration with Danny Pineâs group, Dustin Moraczewski assisted a graduate student, Corey Richier, with the preprocessing of multiple large datasets of resting-state fMRI data for a machine learning study to predict brain age and how it may differ in clinical populations. Finally, in collaboration with Tonya Whiteâs group, Dustin Moraczewski participated on a project that examines the efficacy of machine learning methods through analyzing simulated data. The data resulting from this project has been published in Scientific Data (https://www.nature.com/articles/s41597-025-04740-3).
View original record on NIH RePORTER →