GGrantIndex
← Search

CIF:Small:Collaborative Research:Statistics of slow mixing Markov processes: theory and applications to community detection

$499,468FY2016CSENSF

University Of Hawaii, Honolulu

Investigators

Abstract

Historically, the field of statistics developed around multiple independent observations of phenomena. Yet progress in modern applications from biology to social networks increasingly requires deeper understanding of observations that are dependent. Dependent observations introduce biases unseen in independent sampling. If not interpreted properly, they often lead to misleading conclusions. This research develops a statistical framework to interpret samples generated by dependent observations. The problem is then turned around to develop algorithms that detect meaningful dependencies in data. Such algorithms are used, for example, to identify genes working together, or find the extent to which social networks are polarized on certain issues. Not only is this research translated to classes, demonstrations and outreach programs, but also the location in Hawaii is leveraged to reach out to underrepresented communities in science and engineering, in particular, women and the Pacific Islander community. Stationary properties of Markov processes may not be very well reflected by finite samples, no matter how large the sample size. This bias is often formalized as mixing, and is unseen in independent sampling. This research analyzes Markov samples even before the samples reflect the asymptotic stationary properties, and develops a framework for inference, analysis and simulation of slow mixing Markov processes. Then, algorithmic primitives based on coupling from the past are developed for the widely studied task of community detection in a graph, where vertices are partitioned into clusters so that intra-cluster vertices are connected tighter than those in disparate clusters. As several resampling procedures in machine learning such as cross validation and bootstrap are premised on independent sampling and have no good analogs in the Markov setting, this research develops new resampling approaches for potentially slow mixing Markov processes.

View original record on NSF Award Search →