Career: Research and Education of Flexible Methods for Statistical Modeling and Prediction
University Of Wisconsin-Madison, Madison WI
Investigators
Abstract
In recent years, many novel techniques for regression, classification, and density estimation have been developed, both in statistics and in other related areas such as machine learning and neural networks. Some of these methods have been very successful in practice, but their statistical properties are not fully understood. This hinders the further development of these techniques. The goal of the proposed research is to gain statistical insights into these techniques, and to develop new methodologies and improved algorithms. The specific techniques investigated are the support vector machine, the randomized trees, and the log density functional ANOVA model for continuous and mixed data. Several new techniques are introduced. The support vector machine for multi-category classification with arbitrary cost structures will be further developed. A new framework is proposed that connects the adaptive nearest neighbor estimation and the randomized trees. Through the use of the sparse grid method, a backfitting type algorithm is proposed for fitting the log density functional ANOVA model, with applications to graphical models for continuous and mixed data. These new techniques will be examined through theoretical investigation and empirical evaluation. The investigator will develop a graduate level course on flexible methods for regression, classification, and density estimation, and their applications. Part of the proposed research will be incorporated into the course material. Regression, classification, and density estimation are the standard problems in statistics. Traditional methods typically employ strong distributional assumptions. With the vast computing power of today, it becomes possible to develop and implement more flexible methods, and a host of new techniques emerged, both in statistics and other related areas. Many of these are computationally intensive, and their statistical properties have not been wellunderstood. A clear understanding of these methods is crucial for their further development and statistical education. The proposed research develops valuable insights into flexible statistical methods of current research interest. The techniques developed in the research provides new and useful tools to efficient data analysis, and can be applied to many problems in medical, social, economical, environmental and biological sciences. An important aspect of the current statistical education is the teaching of flexible statistical methods that take advantage of the computing power we have today, and their application in different scientific and industrial areas. The insights and new techniques developed in the proposed research will be incorporated into graduate level courses, and benefit the training of graduate students.
View original record on NSF Award Search →