Model-based Classification of Longitudinal and Functional Data
Northern Illinois University, Dekalb IL
Investigators
Abstract
The idea of classification permeates many scientific studies and arises in almost every area of human endeavors including the classical problems of numerical taxonomy and market segmentation, and the modern areas of machine learning, experimental spectroscopy and biotechnology. Fisher's linear classifier which maximizes the separation between the groups in the spirit of analysis of variance requires multivariate normality for each group with a common covariance matrix. For heterogeneous covariance matrices, the optimal classifier is no longer linear and has poor performance for small samples. Though there are many heuristic and ad hoc methods to handle the case of unequal covariances, model-based approaches using mixtures of multivariate normal distributions and the spectral decomposition of the covariance matrices have shown great promise for the traditional multivariate data. The goal of this research is to develop new and flexible classification methods for longitudinal, functional and multivariate time series data using the Cholesky decomposition of covariance matrices instead of their spectral decompositions. For such data, the Cholesky decomposition is more suitable and its components enjoy both statistical interpretation as certain regression coefficients and geometric interpretation in terms of volumes, shapes and orientations of ellipsoids representing various groups in the data. It is proposed to study the computational, statistical and empirical aspects of using the Cholesky decomposition and compare the results with those obtained using the spectral decomposition. The methods and tools to be employed include: generalized linear and mixed models, factor analysis, time series analysis, maximum likelihood and Bayesian estimation of mixture models in the presence of missing values, cross-validation and bootstrap. The proposed research will extend classical discriminant analysis to longitudinal and functional data. It is of great practical interest and will provide insight into when a particular classification method can be expected to work well, and may lead to the development of new classification criteria and methods for discriminating between nuclear explosions and earthquakes. A problem which is of critical importance for monitoring a comprehensive test-ban treaty. The broader impact of the proposed work can be seen in settings where high-dimensional and large amounts of multivariate data are collected, such as clinical trials, biotechnology, environmental monitoring and global change, epidemiology and financial econometrics.
View original record on NSF Award Search →