IGF::OT::IGF OTHER FUNCTIONS - JIGSWAW: ALGORITHM FOR CANCER BURDEN MEASURES USING CLAIMS DATA.

$104,750N02FY2016CANIH

Outcomes Insights, Inc., Agoura Hills CA

Investigators

Abstract

One of the missions of the Surveillance Informatics Branch within the Surveillance Research Program is to support innovative methods and approaches for analyzing, interpreting, and reporting cancer burden measures. The SEER data is extensively used to monitor national trends in cancer incidence, prevalence and survival in the US (http://seer.cancer.gov/csr/1975_2012/). Registry data linked to Medicare claims (SEER-Medicare) provide an opportunity to extend these types of reports to other types of measures such as treatment utilization, prevalence of comorbidities prior to diagnosis and other longitudinal information such as second line therapy. Importantly NCI is receiving real time claims from oncology practices to enhance and supplement SEER data with detailed treatment information for the Georgia cancer registry. This process is being extended to 5 other registries. The availability of claims data provides an opportunity to expand on the types of cancer burden measures included in standardized reports. The Jigsaw software has been designed to automate the extensive data organization and cleaning process required to create analyzable data sets from electronic health data such as claims data. Jigsaw is actually a study builder. It can be used to select a cohort of individuals for a study from a complex set of health claims. The software also has the potential to create baseline exposure and outcome variables. Jigsaw generates variable labels, a data dictionary, a cohort file (one record per person) and events file (multiple records per person which is the best way to store data for longitudinal analyses). The key to this functionality is the Jigsaw Algorithm Repository, which stores the algorithms required for manipulating the data and creating analysis data sets. This is accomplished by using ConceptQL, an open-source, domain-specific language for storing algorithms to find clinical events in electronic health data. By storing algorithms in a flexible, reusable way, they can be combined to produce many different types of studies. This is critical to producing reproducible studies, as well as updating studies when new data is available.

View original record on NIH RePORTER →