ITR: TeraScale Retrieval
University Of North Carolina At Chapel Hill, Chapel Hill NC
Investigators
Abstract
ITR-0082655 Newby, Gregory B. School of Information and Library Science, University of North Carolina at Chapel Hill ITR/SW: TeraScale Retrieval TeraScale Retrieval addresses the scientific investigation of large-scale information retrieval (IR). TeraScale Retrieval will facilitate experimentation to advance knowledge of information retrieval, especially text retrieval, by implementing a software toolkit for IR research and development. IR systems seek to identify documents or passages from documents that satisfy a human information need. Text retrieval is directed at collections of relatively unstructured documents, such as HTML documents, as well as more structured documents (e.g., XML). In order to advance scientific knowledge and improve performance of IR, there is a need for a software toolkit that enables rapid and practical implementation of experimental IR systems. The TeraScale Retrieval toolkit will emphasize large-scale performance with terascale datasets: hundreds of millions of documents with terabytes of raw data, millions of unique terms, multiple languages, and potential for quadrillions (petascale) of sub-documents or document fragments. The toolkit will emphasize software reuse, high-performance algorithms, and modularity for rapid prototyping and evaluation. Rather than moving quickly from academic use to commercialization, TeraScale Retrieval will focus on experimentation and evaluation to contribute to scientific knowledge about information seeking and use.
View original record on NSF Award Search →