An Intelligent Concept Agent for Assisting with the Application of Metadata

$575,532U01FY2016HGNIH

University Of Calif-Lawrenc Berkeley Lab, Berkeley CA

Investigators

Linked publications & trials

Paper 36208225 Paper 29347969 Paper 28583177 Paper 27899602 Paper 27794045 Paper 27785453

Abstract

PROJECT ABSTRACT Biomedical investigators are generating increasing amounts of complex and diverse data. This data varies tremendously, from genome sequences through phenotypic measurements and imaging data. If researchers and data scientists can tap into this data effectively, then we can gain insights into disease mechanisms and how to tackle them. However, the main stumbling block is that it is increasingly hard to find and integrate the relevant datasets due to the lack of sufficient metadata. A researcher studying Crohn's disease may miss a crucial dataset on how certain microbial communities affect gut histology due to the lack of descriptive tags on the data. Currently, applying metadata is difficult, time-consuming and error prone due to the vast sea of confusing and overlapping standards for each datatype. Often specialized `data wranglers' are employed to apply metadata, but even these experts are hindered by lack of good tools. Here we propose to develop an intelligent agent that researchers and data wranglers can use to assist them apply metadata. The agent is based around a personalized dashboard of metadata elements that can be collected from multiple specialized portals, as well as sites such as Wikipedia. These elements can be coupled with classifiers that can be used to self-identify datasets to which they may be relevant, making the selection of appropriate vocabularies easier for researchers. We will deploy the system for a number of targeted use cases, including annotation of the National Center for Biomedical Information Bio-Samples repository, and annotation of images within the Figshare repository.

View original record on NIH RePORTER →