Change Point Detection for Data with Network Structure
University Of Florida, Gainesville FL
Investigators
Abstract
Detecting breaks and anomalies in a mechanism that drives the generation of data represents a critical task, due to numerous applications in high-impact areas including health, social, and engineering sciences. This project aims to advance the state of the art of change point analysis for big and complex data, by developing a simple to implement, yet powerful, scalable algorithmic framework, thus providing new tools to examine high-dimensional, long streams for events of interest. The potential application domains of this project include but not limited to occurrence of seizure in brain connectivity data sets, coordinated market and other systemic failures in economic and finance data, and identification of orchestrated malicious activities in computer network streams. The developed algorithms and methodology will be implemented in open-source software, while curated data sets will be made available to the community for use in change point analysis investigations. The project will offer multiple unique opportunities for interdisciplinary research training of the future generation of statisticians and for further enhancement of diversity in mathematical sciences. To achieve the stated goals, the project (i) develops a unified detection framework for change points in complex statistical models for network and high dimensional time streams and (ii) provides a rigorous theoretical analysis of their accuracy in the form of consistency, finite sample bounds, and asymptotic distributions for the change points and other model parameters. The framework leverages a simple, easy to implement two-step strategy, wherein the first step one selects windows of the time series of appropriate length and using a standard exhaustive search strategy identifies at most a single change point in each of them. In the second step, a second search based on a global information criterion is employed to eliminate spurious change points. The strategy exhibits linear complexity in time (and thus matches the fastest available in the literature), yet is simple to implement and theoretically analyze, in particular for complex statistical models that exhibit network and low rank structure. Further, the following issues are rigorously addressed: (i) conditions of identifiability of the model parameters and the change points and (ii) probabilistic guarantees and uncertainty quantification for them in the presence of high dimensionality, network structure, temporal dependence, as well as dependence across data streams. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →