GGrantIndex
← Search

Computational Biology Core

$231,853P42FY2008ESNIH

University Of California Berkeley, Berkeley CA

Investigators

Linked publications & trials

Abstract

The support provided under Core D reflect a growing trend in studies of[unreadable] environmental exposure from more traditional epidemiological studies and simple experimental designs to[unreadable] high-dimensional biology, with its emphasis on 'omic' technologies and complicated questions addressing[unreadable] the possible interaction of environmental exposures and high-dimensional measures of the genome,[unreadable] proteome, etc. These high-dimensional data sets are characterized by many (thousands) of measurements[unreadable] made on only a few independent units (e.g., people). Thus, the Core D reflects a parallel evolution in the[unreadable] field of biostatistics towards developing methodologies that can both find patterns in high dimensional data[unreadable] sets as well as providing proper statistical inference for these patterns. Besides offering consulting on[unreadable] traditional epidemiological experimental design and analysis questions, Core D will focus its efforts on[unreadable] providing the most relevant and rigorous statistical techniques to the Program's projects. With new 'omic'[unreadable] technologies, biology has entered a new more empirical phase where the goals of the research are[unreadable] ambitious (e.g., discovery of regulatory gene networks affected by particular environmental toxicants), but[unreadable] the sample sizes relatively small (biological replicates numbering in the tens). With these technologies,[unreadable] have come also a proliferation of proposed methods to find biologically meaningful patterns and typically[unreadable] little theory is provided to guide their relative worth. The goal of this Core is to provide the project[unreadable] researchers with the best techniques available, software to help implement them, a computational[unreadable] environment that can handle computer-intensive methods on large data sets and, most importantly,[unreadable] rigorous statistical inference for the parameters estimated by these procedures. A subset of the[unreadable] developments related to the proliferation of high-dimensional biological/epidemiological data particularly[unreadable] relevant to this proposal are 1) multiple testing, 2) machine-learning and loss-based estimation, 3) grouping[unreadable] algorithms methods, 4) causal inference and 5) biological metadata and systems biology. In addition, the[unreadable] Core will provide access to a computational environment that lends itself to the computationally intensive[unreadable] methods developed for data mining and re-sampling based inference.

View original record on NIH RePORTER →