EAGER: Learning to Efficiently Rank with Cascades

$150,000FY2011CSENSF

University Of Maryland, College Park, College Park MD

Investigators

Abstract

Text search is undeniably vital to today's information-based societies, helping users locate relevant information in web pages, journal articles, news stories, blogs, emails, tweets, and a myriad of other sources. Naturally, users desire results that are not only good but also fast. Learning to rank, the dominant approach to information retrieval (IR) today, focuses almost exclusively on effectiveness, often neglecting the runtime speed (i.e., efficiency) of the ranking functions. This project contributes to the emerging research area of learning to efficiently rank, which aims to let algorithm designers capture, model, and reason about tradeoffs between effectiveness and efficiency in a unified framework. Specifically, this project explores a novel cascade model for retrieval, where ranking is broken into a finite number of distinct stages. Each stage considers successively richer and more complex features, but over successively smaller candidate document sets. The intuition is that although complex features are more time-consuming to compute, examining fewer documents offsets the additional overhead. In other words, the cascade model views retrieval as a multi-stage progressive refinement problem. Based on the survey of the current state-of-the-art, knowledge, this is the first project to explore this approach to the ranking problem, marking a substantial departure from previous "monolithic" ranking functions. Although exploration in this uncharted area carries some risk, this research promises to open up a new frontier in IR research. This project aims to narrow the chasm between academic and industrial IR research by bringing together theoretical IR research and practical considerations in "real-world" search. It is expected that the cascade model will be of interest to web search engine companies, thus providing a path from the exploratory research results to significant impact in production systems. Furthermore, this work dovetails with the emerging area of green computing: more efficient algorithms use less energy, hence help reduce the environmental footprint of web-scale services. The project web site (http://www.umiacs.umd.edu/~jimmylin/projects/) includes more information about this project and will be used for the release of a prototype as part of the Ivory open-source retrieval toolkit.

View original record on NSF Award Search →