SGER: Stochastic Methods for Information Retrieval Systems
North Carolina State University, Raleigh NC
Investigators
Abstract
This research is exploring the use of stochastic processes to develop new techniques that can serve as a theoretical and computational basis for information retrieval systems, pattern matching systems, and generally any application that requires revealing hidden connections in an indexed but otherwise unorganized collection of information. The methodology is predicated on the idea of constructing a Markovian model of the underlying information and utilizing mean first passage times as an asymmetric aspatial metric to gauge degrees of contiguity in the information. The following topics are being studied: - Establishing the theoretical extent to which mean first passage times reveal hidden connectivity in different bodies of information of varying type and varying size. - Developing and implementing fast algorithms for computing mean first passage times. This includes determining the computational feasibility of using the mean first passage time metric on different kinds of large-scale data sets. - Use of a Markov model to capture hidden connections that the Google PageRank approach fails to identify, and assess the inherent tradeoffs between increased computational effort over simple PageRank computations. - Given that mean first passage times can be demonstrated to be theoretically and computationally feasible, the problems associated updating and downdating the underlying information will be investigated. Additions, deletions, or changes to information almost always create, destroy, or change connections (direct as well as latent), and dealing with these effects in large-scale systems running in (or near) real time is a significant hurdle to overcome. Efficient updating techniques for stationary probabilities as well as mean first passage times that are superior to current methods are being evaluated.
View original record on NSF Award Search →