Probabilistic and link-based Methods for Exploiting Very Large Textual Repositories

$310,000FY2003CSENSF

Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI

Investigators

Abstract

This research project addresses the disconnect between the way in which humans ask questions on the Web and the existing interfaces to the state-of-the-art search engines. Search engines require online searchers to formulate their requests in idiosyncratic query languages whose syntax is unnatural and hard to learn by typical users. Furthermore, existing search engines are notoriously bad at returning documents which do not contain any of the terms given by the user and yet which were retrieved as relevant to the user's information need. The proposed work focuses on two areas of research: (1) probabilistic question-to-query transformation (query modulation) for Web access and (2) models of content transfer over web links. The approach for (1) involves designing and evaluating algorithms and systems for automatic, rule-based conversion of natural language queries to the language of specific search engines. Part (2) facilitates retrieval of relevant Web documents by virtue of the links from other relevant documents to them. The expected outcomes and impact of this project are threefold: (1) a better understanding of the interaction between document retrieval and question-answering in a Web environment, (2) better models describing how document relevance is transferred over the Web hypergraph, and (3) better algorithms for natural language access to the Web which will make it easier for millions of web users to find information that they need in a timely, accurate, and intuitive way. All findings and artifacts developed under this grant will be widely disseminated and incorporate into a public-domain search engine, and the results will be accessible via the project Web site (http://tangra.si.umich.edu/clair).

View original record on NSF Award Search →