SEER RRSS #2: Frequency and Predictors of Missing Data in Site-Specific Variable
Emory University, Atlanta GA
Investigators
Abstract
The number of site-specific variables collected by SEER is increasing. The objectives of the proposed study are to evaluate the completeness of the site-specific variables required by SEER, to identify factors associated with missing information, and to quantify the resources required to collect this information. To achieve these objectives, data will be examined for the cancers of interest reported between January 1 2004 and December 31, 2007 and calculate the proportions of cases with missing data for the variables added to SEER after 2004. The association between the completeness of data and various registry-related, patient-related, tumor-related and provider-related characteristics will be examined. Finally, separate analyses will be conducted on six cancers of particular interest to quantify the amount of time and effort required to obtain information on the variables of interest. The number of site-specific variables collected by SEER is increasing. The new cancer characteristics are continually being added to SEER data collection based on their apparent diagnostic, therapeutic and prognostic importance as reported in clinical studies. Size of positive lymph nodes and presence of extracapsular extension have been shown to serve as valuable prognostic factors for various cancer sites of the head and neck. Alpha fetoprotein (AFP) and beta human chorionic gonadotropin (β-HCG) levels assist in distinguishing testicular seminomas from non-seminoma cancers of the testes, and are used for monitoring disease progression and response to therapy. Treatment success and survival of both Hodgkin[unreadable]s and non-Hodgkin[unreadable]s lymphoma patients is associated with the presence of systemic symptoms such as night sweats, weight loss and recurrent skin itching (pruritus). Gleason score is one of the critical factors affecting treatment selection for prostate cancer patients. Indications for hormonal and monoclonal antibody therapy are based on the status of estrogen/progesterone receptors (ER/PR) and human epidermal growth factor receptor 2 (HER-2), respectively. Despite their well-documented clinical importance, the utility of those variables in population-based data may be limited if the information on a large proportion of cases is missing. It is also important to note that data collection for these new variables may require substantial time and effort. It is expected that the information required to code site-specific variables may be readily available in some, but certainly not all, instances. For example, our experience with Gleason score data abstraction indicates that this prostate cancer-specific variable is almost always available from pathology reports. By contrast, determination of the patient[unreadable]s pre-diagnosis PSA level, which is now also required for the newly diagnosed prostate cancers, may involve more extensive examination of medical records, particularly if the PSA analysis was performed on the outpatient basis. For all of the above reasons, and because SEER is interested in streamlining data collection procedures, the frequency with which the site-specific variables are reported and to examine the predictors and sources of the missing data will be determined. Additional analyses will focus on the amount of time and effort required to obtain data for different site-specific variables. Our overall long-term goal is to inform decisions on whether or not inclusion of specific variables is warranted based on the data availability weighed against the amount of time and effort required to collect those data.
View original record on NIH RePORTER →