Statistical Inference under Subjective and Not-Fully-Quantifiable Information on Experimental Units

$200,000FY2006MPSNSF

Ohio State University Research Foundation -Do Not Use, Columbus OH

Investigators

Abstract

In many scientific investigations, there is a wealth of subjective information about experimental material--which agricultural plots are more fertile, which patients are healthier, etc.--that is difficult to quantify as a formal, numerical variate for further use in statistical analysis. Currently, statistical science is unable to effectively exploit this information. This research develops a body of techniques that allow one to recover the bulk of this information and to use it in a fully objective fashion. Importantly, the same mathematical techniques that apply to recovery of subjective information also allow one to exploit ``lesser quality'' covariates which may be subject to substantial measurement error or other biases. These lesser quality covariates are used to create an artificial stratification among the potential experimental units before their responses are measured. The body of work that underlies these techniques is known as ranked set sampling. The current status of ranked set sampling provides a theoretical foundation under very strong assumptions, such as perfect ranking or other precisely specified probability and judgment ranking models. Even a small departure from these assumptions may result in inconsistent estimators that contain substantial bias, even asymptotically. There are reservations within the research community regarding the use of ranked set sampling when either the cost recruiting a unit for a study is substantial or when the number of available experimental units is limited. In these situations, it is desirable to use all available experimental units to perform the experiment. To alleviate these concerns, this research looks at ranked set sampling from a different perspective and identifies three areas where the current state-of-the-art ranked set sampling is either not appropriate, performs poorly, or cannot be used in a satisfactory general fashion. These areas are (i) the use of ranked set sampling in the design of experiments, (ii) the development of low- and medium structure parametric estimation under minimal judgment modeling assumptions, and (iii) the development of models for the ranking process. This research will have a substantial impact on statistical analysis, on several established areas of scientific research, on the quality of life of the U.S. populace, and on the scientific infrastructure of the country. The research, through the mechanisms described in the preceding paragraph, will allow researchers to squeeze more information out of their experiments with novel statistical analyses. The flip side of squeezing more information out of an experiment is the ability to obtain a given amount of information with a smaller experiment. The research will develop better designs to perform experiments, impacting a wide variety of disciplines. As a prime example, with the new designs, a clinical trial, used to establish the benefits of a new drug, will require a smaller number of subjects. The cost of the trial will be reduced and the trial will be completed in a shorter time span. The triple benefit of a smaller, cheaper and quicker trial will expedite the development and approval of new drugs, including those for terminal diseases, such as cancer. In addition to economic benefits, there is the ethical benefit of moving promising new therapies through the development process as quickly as possible, without sacrificing current safeguards in the approval process. The scientific infrastructure of the country will be enhanced by the increased collaboration between statisticians and medical researchers, and by the rigorous training of graduate students. Particular emphasis will be given to the training of women and minorities in the mathematical sciences.

View original record on NSF Award Search →