New Directions in Dimension Reduction

$178,543FY2002MPSNSF

Pennsylvania State Univ University Park, University Park PA

Investigators

Abstract

Abstract DMS-0204662 PI: Bing Li This research will develop methods in dimension reduction, which aim at increased accuracy and a wider spectrum of applications. Specifically, the work will proceed in three main directions. (1) The classical formulation makes the conditional density in regression the target for dimension reduction. This does not take into consideration that in many applications the primary interest centers in the conditional mean. Moreover, the classical formulation requires homoskedasticity among predictors, which can be too restrictive for some problems. To address these issues the investigator proposes to reformulate the problem as reducing the dimensions of the predictors as they appear in the conditional mean. This will allow further dimension reduction, it will improve accuracy and remove the requirement for homoskedasticity. (2) Within the classical formulation one cannot handle categorical predictors, which occur frequently in practice. This research will broaden the proposed formulation so that it can handle such cases. (3) It is then possible and natural to combine these two new elements to further develop a more focused, and less restricted dimension reduction method for conditional means for regressions involving categorical predictors. The methods of dimension reduction were introduced originally to provide a comprehensive graphical tool for exploratory data analysis. Recently, active developments are under way due to the rapid growth of computing power; this has dramatically increased the scope and dimensions of the collected data sets. Besides its important role as a graphic method, dimension reduction is particularly useful in problems where interest lies in identifying connections among the variables, such as classification and clustering. It is also useful when the dimension of a data point exceeds the total number of data points, which is typically the case for many scientific data sets, such as gene expression data. But the available dimension reduction methods have several limitations, such as assuming homogeneity between predictors and not be able to handle categorical predictors. This research will tackle these limitations of the current methodology.

View original record on NSF Award Search →