CAREER: Super-Quantile Based Methods for Analyzing Large-Scale Heterogenous Data

$257,417FY2023MPSNSF

Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI

Investigators

Abstract

Data-driven decision-making has become ubiquitous across many scientific disciplines and in every aspect of life. As society relies more heavily on statistical modeling to make decisions, it becomes imperative for statistical methods to be able to model heterogenous data so that different decisions that will lead to better outcomes for different subgroups can be made. This is appealing in the big data era as data-rich settings allow statistical models to be trained to be more flexible and robust to heterogeneous data. This project focuses on developing a new class of statistical methods for modeling large-scale heterogenous data based on the super-quantile (tail average), also referred to as the expected shortfall or conditional value-at-risk. The methods under development will have the potential to answer questions that cannot be directly answered previously using existing tools in different fields such as climate science, neuroscience, finance, and health disparity research. The project will allow undergraduate and graduate students to work on cutting-edge methods for modeling data heterogeneity. In addition, a graduate-level course on data heterogeneity will be developed. The investigator will also engage in K-12 educational outreach in collaboration with the Center for Educational Outreach at the University of Michigan. With the abundance of data, there has been growing interest in modeling large-scale heterogeneous data for decision-making. Quantile regression is one representative statistical tool for modeling heterogeneous data, but one limitation is that it focuses on a specific quantile level and may not be the best approach for answering scientific questions that involve aggregate information of the lower/upper tail of the distribution of interest. The investigator studies the development of a new class of super-quantile-based tools to help practitioners answer such questions. There are three aims in this project: (i) establish theoretical foundations and develop scalable computational algorithms for fitting super-quantile regression in the big-data regime, high-dimensional regime, and data with outliers; (ii) develop super-quantile regression methods in the presence of unmeasured confounders; (iii) develop a series of super-quantile-based methods for analyzing different data types. The investigator will create a comprehensive platform, including software and e-book tutorials, to encourage using super-quantile-based methods. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →