Complexity Regularization in Statistical Learning Theory

$217,989FY2009MPSNSF

Georgia Tech Research Corporation, Atlanta GA

Investigators

Abstract

Vladimir Koltchinskii studies two important classes of problems in High-Dimensional Statistics and Machine Learning: sparse recovery and manifold learning. In both cases, the focus is on the problems in which penalized empirical risk minimization with convex loss functions and convex complexity penalties is used to define statistical estimators of target functions and in which the geometric nature of the problem plays an important role. One of the goals is to extend the theory of sparse recovery that emerged in Harmonic Analysis, Signal Processing and Statistics beyond the usual framework of finite dictionaries to include a variety of problems that are of importance in Machine Learning (in particular, in kernel machines methods and ensemble methods). Specifically, the aim is to develop a theory of sparse recovery based on penalized empirical risk minimization in large ensembles of kernel machines and in linear spans and convex hulls of infinite dictionaries. Another goal is to develop a mathematical theory of several manifold learning methods introduced in the recent years. This includes methods of statistical estimation of partial differential operators associated with a manifold, such as Laplace-Beltrami operator, based on the data sampled from this manifold. These operators are used to develop an ``approximate version'' of harmonic analysis for functions on the manifold that is of importance in nonparametric function estimation. In particular, the research focuses on the analysis of regularized estimators of eigenvalues and eigenfunctions of these operators and on the development of error bounds for complexity regularized estimators in learning problems for manifold data. The project is closely related to several lines of research in Mathematics, Statistics and Computer Science. Better understanding of subtle geometric nature of complex, high-dimensional data sets and taking it into account in the development of statistical inference for high-dimensional data are very important challenges in Statistics and Machine Learning. Sparse recovery and manifold learning are among the most important developments in these areas where the methods of Asymptotic Geometric Analysis, High-Dimensional Probability and Differential Geometry are used to study a number of challenging statistical problems. This leads to new mathematical tools and new statistical methods with potential applications in a variety of areas where the approach based on Machine Learning is crucial, such as Brain Imaging, Bioinformatics, Data and Visual Analytics. The research also benefits education by providing training opportunities for graduate students and it facilitates exchanges and collaborations between Mathematics, Statistics and Computer Science.

View original record on NSF Award Search →