GGrantIndex
← Search

Kolmogorov's Algorithm Statistics for Dynamics in High Frequency Data

$348,765FY2010MPSNSF

University Of California-Davis, Davis CA

Investigators

Abstract

When applying likelihood theory for analyzing high-frequency time series data, scientists and analysts simultaneously face both computational complexity due to the accommodation of several million observations, and informational complexity, due to the involvement of manifold and diverse dynamic mechanisms. The investigator proposes to establish Kolmogorov's algorithmic statistics as the unified foundation to bridge the gaps caused by these computational and informational complexities, and to make it possible for systematic and effective discovery of characteristic dynamics. The proposal focuses on its development by resolving critical issues including: What are the models of data's individuality and typicality, why are they crucial, and how can they be applied by scientists and analysts for discovery and detection? A new vehicle for this development is the Hierarchical Factor Segmentation (HFS) algorithm. This completely distinct approach is undertaken to transform an observed time series into various counting processes corresponding to different events of interest, and then to apply the coding schemes to achieve lossy data compression as a way to find the governing state-space trajectory. This is accomplished without estimating the point processes' time-varying intensity functions, nor relying on any unrealistic prior knowledge about the number of changes, nor assumptions about the regime-generating mechanisms. Using the computed state-space trajectory, the investigator is able to modify or replace currently prevailing statistical thinking ? such as likelihood theory ? and existing popular methodologies ? such as those based on statistical correlation and association ? by using the connectivity and concurrence of the decoded states. These real-world applications in finance, biology and national security will realistically illuminate the great merit and potential of this new statistical thinking and computing for discovering real dynamics that are of great interest in the sciences and in society. Currently, there are many situations in which data are being sampled and recorded on a time scale of milliseconds, or even nanoseconds. These high-frequency data are found not only in the sciences, but also in economics, finance and national security. However, due to its enormous length and complexity, these data types cannot be handled well using existing statistical methodologies. In fact, prevailing statistical thinking is inadequate for resolving issues underlying these kinds of data. Brand-new statistical thinking is urgently needed to bridge the gap between computing and conception in order to produce coherent and real mechanisms for data analysis. The investigator proposes the algorithmic statistics as the new foundation for scientists and analysts to focus on extracting key characteristics, such as individuality and typicality, within high-frequency data. An algorithmic statistic is a computer algorithm that takes a multidimensional time series consisting of millions of time points as the input, and efficiently computes realistic and sufficient analytic results as the output. Accordingly, the classic concepts, such as correlation and association, would be modified or replaced based on the connectivity and concurrence of decoded significant regimes. In particular, resultant models of individuality and typicality are tremendously useful and important for regulating and detecting purposes. This proposal also targets the detection of any abnormality or extremism, for example, in trading or in physiological and behavioral processes, embedded within long and noisy high-frequency data.

View original record on NSF Award Search →
Kolmogorov's Algorithm Statistics for Dynamics in High Frequency Data · GrantIndex