Reusing Data Efficiently for Iterative and Integrative Inference
Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI
Investigators
Abstract
Drawing knowledge and reproducible results from complex data drives a broad range of scientific disciplines. From a statistical viewpoint, model selection and inference are the two fundamental tasks, the latter often pursued only after models are chosen through data-driven procedures. Naively using the same data for both tasks creates complicated correlations between the selected models and their inferential properties, which inevitably affects the reproducibility of findings from these models. The investigator develops methods for reusing data from selection to compensate for these correlations while not squandering away information from the full data. Finding immediate use in biomedical problems, observational studies in the behavioral sciences, and engineering applications, the methods will aid discoveries even when analyses rely on scarce samples. This research has a broader outreach component in creating opportunities for interdisciplinary engagement, training statisticians, and contributing to a new graduate curriculum. The project is geared towards efficient and reproducible inference through a reuse of data from the model selection steps. Combining ideas from convex optimization, probability theory, and statistical learning, the project seeks solutions for two main thrusts. In the first thrust, the investigator develops methods to integrate fresh samples available at a later point in time with information from selection. This workflow is realized in modern applications such as online streaming of data, which demand iterative inference on the fly. In the second thrust, the investigator explores integrative inference by combining selected models from different batches or splits or sources of data. Aggregating inference from multiple sources through a reuse of samples will have the potential for new discoveries that any single dataset may fail to report due to a lack of power. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →