Collaborative Research: Smoothing Spline Semiparametric Density Models
University Of Massachusetts Amherst, Amherst MA
Investigators
Abstract
A probability density function of multiple variables describes the likelihood of different values the variables can jointly take, therefore, contains full information regarding the distribution of individual variables and their interactions. Given observed data of the random variables, density estimation is at the heart of Statistics and machine learning, where the classical problems such as regression, variable selection, clustering, and dimension reduction, can all be cast into a density estimation problem. Advanced density estimation methods are therefore essential for the extraction of as much information as possible from the data. There has been lack of systematic research in flexible density estimation with high dimensional data or complex data such as clustered data. The overall goal of this project is to develop a smoothing spline based systematic framework that allows for flexible density model building for complex and high dimensional data. As such data arise from a wide range of applications, the results of this proposed research are useful for researchers from a wide range of fields. In particular, the proposed methods will be applied to analyze data in health and medicine, speech, environmental change, food and computer sciences, in collaboration with researchers in these areas. High-performance computing tools will be developed as a result of this research and made publicly available. This project adopts a semi-parametric approach that combines advantages of parametric and nonparametric methods. Flexible and general semi-parametric density and conditional density models for independent and clustered data will be developed and studied. Regularization methods for adaptive density estimation, variable selection in high dimensional conditional density estimation and interaction selection in semi-parametric graphical models will be developed. Nonparametric components will be modeled using reproducing kernel Hilbert spaces which can deal with different density models on different domains with different penalties in a unified fashion. The semiparametric density models considered in this project contain most existing semiparametric density models as special cases as well as many new interesting models. Many methods in this project for adaptive estimation, model/variable selection, model diagnostics and inference are new. These novel methodologies constitute advances in density estimation.
View original record on NSF Award Search →