Core B - Data Repository

$62,048P01FY2016DKNIH

Washington University, Saint Louis MO

Investigators

Linked publications & trials

Paper 37372419 Paper 35533274 Paper 35258509 Paper 34163075 Paper 33684031 Paper 33571448 Paper 32365167 Paper 31831058 Paper 31585844 Paper 31539500 Paper 30443602 Paper 30273610 Paper 29719869 Paper 29515039 Paper 29476613 Paper 29181446 Paper 29073107 Paper 28841195 Paper 28799899 Paper 28684495 Paper 28289731 Paper 28144630 Paper 28094011 Paper 28041931 Paper 27911843 Paper 27760558 Paper 27178263 Paper 26406373 Paper 26344404 Paper 26299453 Paper 26257300 Paper 25815983 Paper 25795776 Paper 25770346 Paper 25766684 Paper 25576328 Paper 25307765 Paper 25284151 Paper 25170151 Paper 25036628 Paper 24950202 Paper 24452263 Paper 24259713 Paper 24009397 Paper 23975157 Paper 23898195 Paper 23828941 Paper 23363771 Paper 23202435 Paper 23184592 Paper 22980325 Paper 22864264 Paper 22699611 Paper 22678395 Paper 22665442 Paper 22424233 Paper 22402401 Paper 22179717 Paper 22170430 Paper 22161565 Paper 22030749 Paper 22018228 Paper 21903626 Paper 21765408 Paper 21677749 Paper 21624126 Paper 21596990 Paper 21593810 Paper 21543530 Paper 21530737 Paper 21485746 Paper 21436049 Paper 21317366 Paper 20944220 Paper 20818378 Paper 20674856 Paper 20664551 Paper 20631792 Paper 20444704 Paper 20383131 Paper 20363958 Paper 20197316 Paper 19892944 Paper 19710709 Paper 19706296 Paper 19491241 Paper 19383763 Paper 19279067 Paper 19099591 Paper 19046431 Paper 19043404 Paper 19026936 Paper 19004758 Paper 18806222 Paper 18723574 Paper 18555187 Paper 18541218 Paper 18497261 Paper 18407065 Paper 18280814

Abstract

SUMMARY Core B will provide database infrastructure and coordinate data deposition to public resources such as INSDC (the International Nucleotide Sequence Database Collaboration), as well as coordinate the data for the project overall to facilitate comparison with other datasets and deployment of advanced algorithms. It will build on the Knight lab's extensive experience with meta-analysis, sequence databases, and data visualization to provide these methods to Project 1, Project 2, and Project 3, and will work closely with Core A to mirror metabolomics data and integrate metabolomics datasets with the rest of the multi-omic data to be collected. Core B has three Aims. Aim 1-organize the data and metadata collected in Project 1 from mice and Project 2 from humans, curate these datasets, and ensure that analyses are reproducible in an automated fashion using virtual machines. Aim 2-deposit the data and metadata in standards-compliant form to INSDC, the Gene Expression Omnibus, and other resources (e.g., metabolomics repositories) as they emerge. Aim 3-Through analyses of existing microbiome datasets, provide best-practices recommendations to investigators in Project 1 and Project 2 to optimize experimental design. Core B will build on an extensive multi-omics data repository funded by multiple sources that is able to accommodate the types of data to be collected in the project overall, including links between human subjects with defined family relationships (e.g. dizygotic twins), humanized gnotobiotic mice colonized with strains derived from these human subjects, timeseries study designs in both humans and mice, combinations of data at multiple levels including 16S rRNA gene sequencing, RNA-Seq, and metabolomics, and other advanced features of this complex project. A key component of our approach is to enable investigators in the laboratory collecting the datasets to perform their own first-pass analyses while at the same time making the data available more broadly within the project for additional advanced analyses, such as those being developed in Project 3, to be applied, and also making the data available to the public in a relatively user-friendly form to supplement public deposition in permanent government-backed sequence data repositories.

View original record on NIH RePORTER →