Training Effects on Observer Operating Points in Technology Assessment Studies

$255,640R01FY2015CANIH

University Of Pittsburgh At Pittsburgh, Pittsburgh PA

Investigators

Abstract

DESCRIPTION (provided by applicant): Retrospective observer performance studies have become the most commonly used approach for generating inferences as to the expected performance in the clinical practice, since prospective assessments in which the observer becomes an integral part of the diagnostic system are expensive, difficult to perform, time consuming, and are generally performed after a new technology and / or a practice has been approved and been in practice for a while by a large number of users. The general belief is that findings from retrospective studies can be used to infer, at least on a relative scale, what should occur in actual clinical practice. Unfortunately, previous studies have shown that observers behave significantly different in retrospective studies than during interpretations of actual clinial examinations that affect patient care. One of the important issues related to these inferences is the training of observers. While there are indications that, in the laboratory, disease prevalence has little, if any, effect on performance, the effect of training and, as important, the possible effect of disease prevalence in the training set has not been investigated. This may be of utmost importance, as not only the overall performance but also the operating point, when implementing new technologies or practices is very important for clinical applications. If a new system or practice is indeed better (i.e., performs along a higher curve), the tradeoff between actually increasing sensitivity and / or specificity (or a combination of both) may largely depend on how observers are specifically trained prior to a study (i.e., a particular emphasis on sensitivity or specificity during training). The primary hypothesis to be tested in this study is tat disease prevalence in training sets could, and likely would, affect actual operating points of observers, but observers will largely operate along a performance curve (ROC or FROC type) that is determined by (inherent to) the imaging system or practice rather than the specific training set. Therefore, a clinical practice could also be significantly affected by a specific emphasis during training. We propose to investigate this issue by performing a unique observer study in which substantially different disease prevalence levels will be presented in different training sets while being tested when reading the same case set. If our primary hypothesis is proved, then, by providing training with a specific emphasis, specific training could be used to optimize the intended the clinical practice by emphasizing the desired parameter (i.e., sensitivity, specificity or a combination of both). Thus, the proposed investigation will be extremely important for acceptance of observer studies in the assessment and approval process of new technologies and practices. An actual example that resulted in the delay of approving a new technology by more than two years because of a clinically undesirable shift in operating points demonstrates directly the importance of this study.

View original record on NIH RePORTER →