Data Adaptive Estimation in Genomics and Epidemiology

$238,159R01FY2006GMNIH

University Of California Berkeley, Berkeley CA

Investigators

Linked publications & trials

Paper 19333126 Paper 19255599 Paper 17971533 Paper 17875580 Paper 17806060 Paper 17630549 Paper 17450501 Paper 17402922 Paper 17133209 Paper 16617276

Abstract

[unreadable] DESCRIPTION (provided by applicant): The broad objective of this project is to develop, study, test, and implement new data adaptive estimation methods for research in Genomics and Epidemiology based on new theoretical results. Existing estimator selection procedures have recently been proved to underestimate the amount of information in finite sample data used for selecting an appropriate estimator of a parameter of interest. Such parameters are by definition used to directly answer Public Health questions of interest. The methods proposed will fully exploit the information contained in data to provide the best estimates of parameters of interest in statistical analyses of data collected for research in Genomics and Epidemiology. These methods will be developed for association analysis, survival, analysis, causal inference, transcription factor binding site detection, microarray data analysis with or without censored data, and for point treatment and longitudinal data. Complex estimation methodologies based on estimating function approaches will be combined to the general methodology considered to provide estimates for complex longitudinal data. The estimation procedure proposed relies on three components: a unified cross-validation estimator selection methodology, construction of sieve-specific estimators, and an aggressive algorithm for generating the corresponding candidate sieve-specific estimators of a parameter of interest so as to thoroughly search the space of all possible estimators. A new method for constructing discrete sieve estimators and data adaptively selecting the corresponding best estimator will be studied and tested in comparison with the construction and selection of common continuous sieve estimators. [unreadable] Ultimately this project will develop open source, computationally intensive, statistical packages for use with the R and Splus interfaces by researchers in Public Health. These packages will provide black box implementations of a range of data adaptive estimators for problems in Genomics and Epidemiology. They will include routines written in C to enhance the computation speed of the portions of the algorithms that are computationally intensive and will be developed with subject-matter experts to ensure adequacy for the needs of Public Health research. The routines will be applied on publicly available data or real data provided by the subject-matter experts, enabling immediate testing of the proposed methods and software. In addition, simulations imitating real data studies will allow a truthful check of the performance of the methods in comparison to the current estimation methods used. Distribution of these packages for Windows, Linux and Mac OS platforms will use the R and Bio-conductor projects. These packages will include detailed documentation, examples, and data sets. [unreadable] [unreadable] [unreadable] [unreadable]

View original record on NIH RePORTER →