III: Small: Efficient Query Processing in Large Search Engines
New York University, New York NY
Investigators
Abstract
The largest web search engines now receive hundreds of millions of queries per day that need to be answered in fractions of a second on collections of tens of billions of web documents. In order to process all these queries, search engines consume increasing amounts of hardware and energy resources. This project focuses on developing new algorithms, index structures, and other software techniques for scaling query processing in search engines, that is, techniques that allow queries to be executed faster and on larger data sets using fewer hardware and energy resources. Research activities in this project focus on three main approaches. First, the project studies how index size and access time can be reduced through improved index compression techniques. Second, work on new early termination techniques considers how the top results for a query can be computed without exhaustive traversal of the index structures for the query terms, for simple ranking functions such as BM25 or Cosine, and for the more complex functions with many features used by current web search engines. Finally, the project explores general techniques for query optimization in information retrieval (IR) systems, inspired by the significant body of work on query optimizers in database systems. Web search engines are a multi-billion dollar industry and a crucial component of the internet. Techniques resulting from this project are expected to benefit this industry by reducing the hardware cost and energy consumption of large-scale search services. Results will be disseminated through publications in major conferences and journals, tutorials at conferences, distribution of software libraries, contributions to existing software tools such as Lucene. This project provides research and educational opportunities for graduate and undergraduate students and prepare them for later work at companies, research labs, or universities. Web site (http://cis.poly.edu/westlab/queryproc/) provides more information about this project.
View original record on NSF Award Search →