III: Small: Collaborative Research: Reducing Classifier Bias in Social Media Studies of Public Health
National Opinion Research Center, Chicago IL
Investigators
Abstract
Social media creates a new opportunity for public health research, giving greater reach at lower cost than traditional survey methods. Online content offers several potential advantages over traditional survey data; one can in real-time measure how behaviors and attitudes change in response to rare events such as legal changes, new products, and marketing campaigns. Machine learning techniques for classification can be used to tailor interventions that improve health outcomes while minimizing costs. However, online content is not a random sample, potentially biasing the outcomes. This proposal develops techniques to overcome this problem, enabling effective use of publicly available social media data for public health research. The approaches are evaluated against a traditional survey-based approach to evaluate end-to-end effectiveness in a real-world public health scenario, determining effectiveness of smoking cessation campaigns. The project builds on well-grounded statistical approaches to eliminate classifier bias. Key innovations are extending this to the high-dimensional, noisy domain of textual social media data (specifically Twitter), robustness to confounding variables, and scalable methods to identify comparison groups. Noisy data will be addressed through advancing multiple imputation techniques. The project will develop a model-based approach to identifying comparison groups that addresses confounding variable issues. The methods will be evaluated in the context of an actual public health study of smoking cessation, based on historical Twitter data and traditional surveys conducted before and after a CDC campaign as well as a survey of smokers on perceived risk factors of e-cigarettes.
View original record on NSF Award Search →