Model and Data Sharing Core
Duke University, Durham NC
Investigators
Abstract
The Model and Data Sharing Core (MDSC) will develop novel infrastructure and methods to: (1) promote access to and sharing of multimodal models and data arising from the three Research Projects (RP1-3) within the Center of Excellence (CoE); (2) innovate and advance ways of achieving FAIR sharing of Infectious and Immune- mediated Disease (IID) models; (3) develop and maintain an infrastructure that can be scaled to handle emergency computational workloads in IID modeling in case of unexpected infectious disease outbreaks, epidemics, or pandemics. While primarily designed for use with CoE models and data, the infrastructure will be extensible to support the network of IID modeling groups sponsored by NIAID. The MDSC will accomplish this through three aims. In Aim 1, we will build an artificial intelligence-based navigator service to navigate models and datasets, retrieving information, and coding assistance for model execution. Termed CAIRNS (comprehensive AI resource navigator service), the AI agent will serve as a universal portal for navigating and accessing various immunology databases, and analyzing models and data retrieved from these databases, by fine-tuning general-purpose large language models (LLMs) with domain knowledge of immunology and immunology databases. We will also develop rigorous benchmarking datasets to evaluate the accuracy and reliability of the AI agent. In Aim 2, we will create an innovative infrastructure that advances FAIR data and model sharing for the IID community. This infrastructure includes iRODS (Integrated Rule-Oriented Data System) and the HeLx platforms. iRODS is a rules-driven data management platform that is widely used in biomedical data repositories. iRODS can also serve as a workflow manager by invoking different steps in a workflow using rules to ensure that the right sequence of operations is invoked with adequate metadata being captured at each step to ensure provenance of the results. Metadata templates will be build using the Center for Expanded Data Annotation and Retrieval (CEDAR) framework. The strength of iRODS is that different apps/tools can be invoked for each step of the workflow depending on the type of data/model being deposited (scRNA-seq, CODEX, ABM, neural network etc.). The HeLx platform accelerates scientific discovery by incorporating data science tools into research, simplifying data sharing, collaboration, analysis, and management. Through a comprehensive software framework, HeLx connects research communities with cloud-based computational capabilities and specialized science workspaces. In Aim 3, we will provide infrastructure to support MDSC resources for Opportunity Fund awardees. The Opportunity Fund award winners may not have the data and compute infrastructure needed to complete their research. The MDSC will support the Opportunity Fund awardees by providing technical expertise to initiate their projects and by ensuring awardees are able to share their data/models using MDSC. The totality of infrastructure built and supported by the MDSC will serve users within Duke, the region, and IID modelers sponsored by NIAID.
View original record on NIH RePORTER →