Ranking Large Information Sets

$330,000FY2011ENGNSF

Columbia University, New York NY

Investigators

Mariana Olvera-Craviotocontact Predrag R Jelenkovic

Abstract

The objective of this research project is to develop statistical tools and construct efficient simulation methods for validation and testing. Rapidly growing webs of interconnected, dynamic and complex information sets, e.g., the World Wide Web (WWW), scientific data, social networks, news, national security data, etc., are reaching unprecedented scales. Hence, effective methods for ordering/ranking these data sets are of utmost importance for making the best use of this wealth of information. Given that the scale and complexity of these information sets will continue to increase in the future, a new probabilistic approach for understanding their average behavior is needed in the same way that statistical mechanics was needed for understanding large sets of molecules. To this end, statistical tools will be developed for the analysis of a variety of dynamic, distributed, and possibly nonlinear information ranking algorithms. The novel statistical methodology to be developed will provide a framework for designing ranking algorithms with a pre-specified behavior. As a complement to the analytical work, efficient simulation methods will be constructed for the validation of modeling assumptions and for testing the second-order properties of ranking algorithms and information webs. If successful, the results of this research will provide a new framework for the analysis and design of customized ranking algorithms with a predetermined typical behavior, which will result in algorithms better tailored to the diverse requirements of specific application areas. Given that the work will pursue analytically tractable approximation methods, it is expected that it will provide a considerable amount of new insights and design rules of thumb for ranking algorithms. Furthermore, the developed mathematical techniques will significantly enrich the existing literature on weighted stochastic recursions, heavy-tailed large deviations, weighted branching processes, and efficient simulation methods. This work is also expected to have a substantial broader impact, since the preceding mathematical disciplines are heavily used in a wide variety of application areas that include the analysis of algorithms, biology, and statistical mechanics.

View original record on NSF Award Search →