Experimental Design-based Weighted Sampling
Georgia Tech Research Corporation, Atlanta GA
Investigators
Abstract
Random sampling is commonly used for inferring quantities of interest of a population. This project aims to develop a deterministic sampling technique that improves upon the existing sampling techniques by introducing weights for each sample. Building on the recent developments of data compression and subsampling algorithms, the new sampling technique has the potential to overcome the computational challenges faced by the existing techniques. The developed techniques are broad and can have applications in many fields of science and engineering, such as uncertainty quantification, Bayesian statistics, simulation, stochastic optimization, machine learning, and numerical analysis, to name a few. The PI will develop software packages to be distributed to the public for free to enable the widespread use of the proposed techniques. The sampling techniques will also be implemented in industries and tested in several engineering applications, which will provide a broad and immediate impact on society. The project also provides research training opportunities for graduate students. The experimental design-based weighted sampling technique is developed by working on three broad classes of problems in statistics: (1) uncertainty quantification, where the probability density is known and fully specified, (2) Bayesian computation, where the probability density is known only up to a proportionality constant, and (3) data compression, where the probability distribution is unknown. The key idea of the proposed technique is to relax the restriction that the samples should follow the underlying distribution of the population. Instead, the samples are optimally generated to improve the estimation of the quantity of interest, and then weights are used for correcting the distributional mismatch. The resulting weighted sample can provide a more robust performance compared to the existing unweighted samples. The project develops new techniques such as optimally weighted quantizers, weighted minimum energy designs, adaptive integration methods using sequential designs, weighted twinning, and supervised data compression techniques using weights. Overall, this project will develop a suite of theoretically sound and computationally efficient techniques for weighted sampling that represents a significant advancement from the existing techniques that focus on generating unweighted samples from the target distribution. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →