Semiparametric regression methods for cross-sectional and longitudinal data with errors-in-covariates

$149,679FY2001SBENSF

University Of Chicago, Chicago IL

Investigators

Abstract

This project is to develop a new statistical methodology to address the problem of measurement error in regression covariates. The method is embedded in a very general quasilikelihood framework that only requires specification of the mean and variance of a response variable y given covariates (u,w), thereby encompassing a broad class of models. The covariate u is not observed directly, but rather is measured through a surrogate x. Similarly to y, a model for the mean and variance of x given (u,w) is also posited. In the first phase of the project, the covariates u will be treated as a fixed "nuisance parameters" --- quantities that are not of interest to the researcher, but are required to fit the model of interest. The research will exploit recent developments in the statistical methodology for eliminating nuisance parameters in estimation functions (M-estimators). An alternative approach is to treat the mismeasured covariates u as unobserved random variables. This approach holds potential to increase statistical efficiency, but may be sensitive to assumptions made about the distribution of u given other covariates w. A second phase of this project will develop a robust approach to errors-in-covariates wherein a working model for the distribution of u given w is employed to increase efficiency, while protecting against bias due to misspecification of that distribution. This work will be an extension of the nuisance parameter approach. The main area of application for the project is non-linear regression models, and models for longitudinal data will receive particular attention. This will be accomplished by exploiting the natural connection of generalized estimating equation models for longitudinal response data to quasilikelihood models for univariate responses. A very large proportion of statistical methods in use today are based on regression models which express the average value of a response variable as a function of given values of other co-variables. Questions of scientific interest can then be formulated in terms of how the average response varies across a range of values of the covariates. In standard regression models, noise in the data is assumed to occur as unexplained variability in the response around its mean value, and the covariates are not considered to be subject to such variability. For example, age and sex are usually measured very accurately. However, it is often the case that some covariates are prone to considerable errors of measurement. Errors-in-covariates arise especially in observational studies which often rely upon self-report or other imperfect measures for variables such as dietary or alcohol intake, workplace or environmental exposures, clinical measures (e.g., blood pressure), and measures of income and wealth. Errors may be due to reporting biases, variability in recall, laboratory variability, intra-individual variability over time, inter-informant error, or differences in perspective of respondents. This research project will develop a new method for estimation of non-linear regression models when one or more covariates are subject to error. The project is important because (i) it represents the first application of a new technique for doing statistical estimation in a broad class of problems, which includes errors-in-covariates as a special case; (ii) the technique requires fewer modeling assumptions than those previously developed; and (iii) the approach will be extended to longitudinal data analysis, for which the problem of errors-in-covariates has received relatively little attention. Many problems in social sciences and public health require longitudinal data, so advanced regression methods for the analysis of these data are critical to furthering these inquiries.

View original record on NSF Award Search →