DC: Small: An Integrated Architecture for Federated Search

$499,671FY2009CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

This project is developing a general solution to two new federated search problems. Massive web datasets are made more manageable by dividing them into topic-oriented "shards," and then searching only a few shards per query, thus reducing computational costs dramatically. Integration of focused "vertical search services" into general-purpose search portals is made more manageable by a framework that uses multiple techniques to characterize the contents of each resource and track how its content and query traffic change over time. Introducing resource definition policies and diverse information into federated search requires solving a variety of new resource representation, resource selection, and result merging problems. This research is also addressing the requirements of dynamic resources, and looks beyond average case analysis to characterize the range of accuracy that a federated search service experiences. Reducing the computational costs of searching massive web corpora enables greater academic study of large web datasets, and lowers costs for web search companies. A comprehensive framework for integrating specialized information services in web portals makes it easier for commercial web search portals to deploy new search services. New algorithms are being disseminated in the open-source Lemur Toolkit, thus making it very accessible. Datasets are being published in a form that enables them to be recreated precisely or closely by other researchers. Queries and relevance judgments are being published so that they may be used by other researchers. This project is an extension of research done in IIS-0841275, SGER: Multi-Tier Indexing for Web Search Engines. Project URL: http://www.cs.cmu.edu/~callan/Projects/IIS-0916553/

View original record on NSF Award Search →