High Dimensional Mixture Models

$564,000FY2004MPSNSF

Pennsylvania State Univ University Park, University Park PA

Investigators

Abstract

The purpose of this project is the development of theory, statistical methodology, and computational methods for use in mixture models in high dimensional data. In the theoretical portion, the investigator enhances the potential statistical applications of these models by examining their topographical structure and their relationship to other high-dimensional methods such as local linear regression and hierarchical trees. New kernel densities are being constructed by the use of the idea of diffusion processes. New methods to assess the important aspects of identifiability in these models are under development. In addition to these basic theoretical developments, the investigator is creating a set of methods designed to fit diffusion mixture models, and to assess their fit, in high dimensions. A key part of this methodological development is occuring in computational enhancements. The statistics community is faced with a great challenge by modern science, and that is to develop new tools for scientific inference in the aftermath of the data revolution. Modern data is potentially high in dimension, and massive in the number of collected units. The probability models called mixture models have had a long history of use in describing heterogeneity in data samples. They are extremely flexible, and provide a compact picture of the key features of the data structure. Unfortunately, limited theoretical developments in this difficult area have held back their use in high dimensional problems. This research project targets a number of the key difficulties remaining in this area, including the integration of this methodology with other existing ones, the expansion of this methodology into new data types, and a better understanding of this model's structure in very high dimensions. The methods that arise from these developments are being turned into computational packages so that they can be used by scientists.

View original record on NSF Award Search →