CAREER: Machine Learning for Complex Health Data Analytics
University Of Massachusetts Amherst, Amherst MA
Investigators
Abstract
The fields of health and behavioral science are currently undergoing a data revolution. The Health Information Technology for Economic and Clinical Health act of 2009 has resulted in the wide adoption of electronic health records and the emergence of increasingly vast stores of heterogeneous clinical data. Simultaneously, emerging mobile health (mHealth) technologies are enabling the collection of ever-larger volumes of continuous physiological measurements and behavioral self-report data in non-clinical settings. Such data sources have the potential to yield transformative advances in the fundamental understanding of human behavior and health. They also have the potential to significantly enhance numerous applications including data-driven clinical decision support and continuous health monitoring, which will lead to increased efficiency within the healthcare system and facilitate a transition to patient-centered, personalized care. The proposed work will address several fundamental sources of complexity in the analysis of both clinical and mHealth data, enabling researchers in health and behavioral science to extract more useful knowledge from these data sources. The software toolboxes that will be developed will have immediate applications in research conducted by a network of research partners, and will also be broadly disseminated. The integrated education plan includes the development of an innovative applied machine learning course that will provide training in topics like cloud-scale computing that are of direct relevance to massive health data analytics. The outreach plan involves developing and running a health data-themed outreach workshop for underrepresented groups to foster computational thinking and broaden participation in computing. The ability to learn models from complex data and apply those models to extract useful knowledge is at the core of machine learning research. This proposal seeks to significantly expand the frontiers of machine learning by developing new models and algorithms designed to meet the challenges posed by complex health data analysis. Key sources of complexity in clinical and mHealth data include sparse and irregular sampling, incompleteness, noise, non-stationary temporal dynamics, between-subjects variability, high volume, high velocity and heterogeneity. The presence of one or more of these factors in a given data source is often sufficient to render current machine learning methods ineffective or completely inapplicable. The long-term goal of this research is the development and validation of customized machine learning models and algorithms that can respond to all of these challenges. The objective of this proposal is to develop models and algorithms that address the following specific problems: (1) How can we extract useful knowledge from sparse and irregularly sampled clinical time series data? (2) How can we automate feature discovery from wearable physiological sensor data in the presence of high levels of noise, significant between subjects variability, and heterogeneous sensing modalities? (3) How can we make the learning of physiological time series event detection algorithms robust to event labels that are obtained through self-report mechanisms with limited reliability and temporal fidelity?
View original record on NSF Award Search →