New Developments of Nonlinear Dependent Models, with Applications in Genetics, Finance and the Environment

$180,000FY2008MPSNSF

University Of Wisconsin-Madison, Madison WI

Investigators

Abstract

High-dimensional and complex data are now collected routinely in the fields of environment, financial markets, and signal and image processing. A major challenge is to find methods to analyze the structure of such data sets, to fit models with desired dependence properties, and to identify and validate patterns. A major goal of the proposal is to make significant methodological and theoretical contributions to the important and challenging low-sample and high-dimension statistical inference problems such as dimension reduction in bio-informatics, and extreme dependence problems arising from environmental, and financial markets. The proposal consists of four important steps. First, the proposal pursues a series of developments of new measures for nonlinear dependencies. The investigator studies the limiting distributions of dependence measures (quotient correlation coefficients) and analyzes asymptotic powers when the dependence structures are specified. In DNA microarray data analysis, the quotient correlation is used to select the best feature subset of genes, and then the selected subset is used to predict classes for all sample data. Second, a tail dependent measure (a tail quotient correlation coefficient) with varying threshold is introduced. This measure is related to the study of statistics of multivariate extremes, and is used to assess asymptotic (in)dependencies in environmental variables. Third, the proposal includes the development of statistical estimation methods for asymptotically (in)dependent multivariate maxima and moving maxima processes. This allows one to efficiently study clustered spatial-temporal extreme observations. Fourth, the proposal studies GARCH(r,s) models with m-dependent residuals. The intellectual merit of the proposal in a first instance stems from an efficient dimension reduction approach using the quotient correlation concept. In DNA microarray data analysis, in which there are thousands of variables (genes) in gene expression profiles, and class prediction is an important problem. It is important to identify subsets of genes to work with, due to the high dimensional feature and small sample size of the data under investigation. The ultimate goal is to select the smallest subset of genes which contribute toward the classifications and predictions. Among existing gene selection methods, it is hard to find one which always performs better than the rest when applying them to different data sets. The proposal specifically aims at finding a solution for this. Beyond methodological merits and specific applications, the proposal also has a considerable broad impact. Throughout applications in diverse fields (like above), extreme risks play an important scientific, societal, as well as (possibly) political role. The dissemination of new statistical tools leading to a better understanding of the occurrence of joint extremes is of great importance. This can be well achieved at the level of new graduate courses, publications in journals aimed at a broad audience and in discussion with scientists from other fields. To name just one potential example where the proposal has great impact, let us consider financial risk management. Due to the establishment of new regulatory-rules for banking solvency (so-called Basel II proposal), banks have to come up (in their analysis of credit risk, for instance) with stress testing procedures which can immediately be formulated in terms of extremal co-movements. Similarly, in multi-line insurance, underwriters have to take care of joint large losses in many different lines of business. It is exactly for these kinds of applications that the proposal yields new tools.

View original record on NSF Award Search →