CDS&E-MSS: Data thinning: methods, theory, and applications
University Of Washington, Seattle WA
Investigators
Abstract
In recent years, there has been an explosion of data across virtually all areas of science and engineering. This has been paired with an increase in the availability and complexity of statistical and machine-learning methods to analyze these data. However, a major challenge arises in validating the resulting models, i.e. in making sure that they capture signal rather than noise. In this project, the project will develop a new framework to split up a dataset into two separate parts, so that one part can be used to fit a complex model, and the remaining part can be used to validate it. Graduate students will be involved in this project, and the PI will carry out efforts to diversify the workforce in statistics. This project explores “data thinning,” a new idea for splitting a dataset arising from a convolution-closed distribution into two or more independent components that sum to yield the original dataset. The PI will generalize the data thinning framework beyond convolution-closed distributions to a much broader class of distributions, by allowing for a dataset to be thinned into two or more independent components that can be re-combined to yield the original dataset via a function other than addition, using principles of sufficiency. Once two or more independent components have been obtained, a model can be fit to one component and validated on the other(s). Connections to sample splitting, data fission, and other related proposals will also be explored. The PI will also collaborate with biomedical researchers to tailor this framework to data from diverse domain areas and will release high-quality software. Ultimately, the data thinning framework will enable rigorous validation and inference after model fitting for a host of problems in science and engineering. Broader impacts include the involvement of graduate students in this research, as well as ongoing diversity and inclusion efforts by the PI. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →