CAREER: Semiparametric and Machine Learning Approaches to Big Data Challenges in Precision Medicine
North Carolina State University, Raleigh NC
Investigators
Abstract
In the era of Big Data, the goal of better patient outcomes, coupled with lower cost and burden has generated tremendous interest in precision (or personalized, individualized) medicine, which is defined as treatments targeted to the needs of individual patients on the basis of genetic, biomarker, phenotypic, or psychosocial characteristics that distinguish a given patient from other patients with similar clinical presentations (Jameson and Longo, 2015).Precision medicine can be operationalized using individual's health-related metrics and environmental factors to discover individualized treatment regimes (ITRs); methodology for such discovery is an emerging field of statistics. The proposed methods are expected to bring a great impact to accelerate the discovery of new personalized treatment strategies. Therefore the proposed work is directly related with the White House Precision Medicine Initiative(https://www.whitehouse.gov/precision-medicine) as a research effort to revolutionize how to improve health and treat disease. The proposed methods are also general enough to be applied to a variety of data sources including clinical, biomarker, economic and financial data. If successful, the projects will greatly enhance the acquisition and analysis of large-scale data for the scientific and engineering communities. The main objective of this proposal is to develop cutting-edge semiparametric methods and machine learning tools to realize the promise of precision medicine. Specifically, the PI aims to: develop flexible and efficient methods for discovering optimal ITRs (Aim 1); develop a general class of optimal ITRs (Aim 2); develop optimal ITRs with high-dimensional data (Aim 3); and develop optimal ITRs under population heterogeneity (Aim 4). The proposed work contributes to both semiparametric inference and machine learning fields. Machine learning methods have rarely been studied for doubly robust estimation and optimal ITRs with high-dimensional data. The theoretical developments including driving nonasymptotic distribution, risk bounds, new empirical process technical tools are challenging. The methodologies to be developed in this project will be fundamentally important and generally applicable for studying semiparametric models in high-dimensional setting. Using semiparametric and machine learning methods for precision medicine is an emerging novel area. The integration of research and education is a key aspect of this project. New courses on statistical learning and semiparametric inference will be developed. These courses will broaden the areas of specialized training in a department that has a strong history of attracting under represented groups. The PI is expecting to stimulate interests from a diverse group of researchers in numerous fields. The PI will also reach out to the K-12 education levels by training high school teachers.
View original record on NSF Award Search →