GGrantIndex
← Search

Regional Oncology Research Center (Risk Factors)

$100,000P30FY2025CANIH

Johns Hopkins University, Baltimore MD

Investigators

Linked publications & trials

Abstract

Project Summary/Abstract This application is being submitted in response to administrative supplements for research leveraging novel data science approaches to address integration of modifiable risk factors on cancer outcomes. Tobacco, alcohol, and cannabis are three commonly used psychoactive substances in the United States, and all are modifiable behavioral risk factors with direct or indirect links to cancer. Critically, these substances are rarely used in isolation. National survey and intensive longitudinal data demonstrate high rates of same-day and same- occasion co-use. Co-use, especially if initiated early in life, may alter patterns of consumption (e.g., smoking more when drinking), exacerbate consequences of each substance, and increase cumulative exposure to carcinogens over the lifetime. Yet, despite the substantial prevalence of co-use of these substances and their associated cancer risk, large datasets collected in real-world settings that allow us to model co-use at high temporal resolution are rare. Traditional analytic models fall short in capturing complex interactions of these behaviors and lack the temporal granularity needed to examine the dynamics in real-world contexts. This project addresses this critical gap by leveraging intensive daily (24-hour) data on tobacco, alcohol, and cannabis use collected over 30-day periods in diverse populations to model the joint, dynamic impact of these cancer-relevant exposures. We will develop data science approaches with state-of-the-art machine learning techniques to identify high-risk co-use patterns, understand their occurrence in daily life, and uncover opportunities for intervention. The findings will support the development of targeted cancer prevention strategies and inform future clinical and public health efforts to mitigate poly-substance use earlier in life and thus reduce downstream cancer risk. The proposal will address two specific aims: Aim 1 will harmonize and integrate five EMA datasets containing reports of tobacco, alcohol, and cannabis use collected over 24-hour periods. This will yield a dataset comprising over 8,000 daily (24-hour) observations from 373 participants, enabling robust temporal modeling of co-use. Aim 2 will use representation learning to learn behavioral embeddings from temporal co-use data. We will apply self- supervised learning, including contrastive and temporal embedding methods, to model time series data on tobacco, alcohol, and cannabis use. We will use these representations to (1) identify latent states and subgroups through clustering, and (2) integrate them with time-varying predictors. As a next step, high-resolution co-use profiles from Aim 2 can inform two translational applications: (1) Temporal patterns of tobacco, alcohol, and cannabis use can identify which substance to prioritize for intervention to most effectively reduce cancer-relevant exposure, based on frequency, context, and sequence. (2) Our fine-grained EMA data can be leveraged in AI models to impute co-use patterns in large datasets (e.g., PATH, All of Us) using demographic, substance use, and other data, which will enable broader cancer prevention modeling at the population level.

View original record on NIH RePORTER →