SBIR Phase I: Concise Visualization of a Document Collection via Conceptual Clustering
Vivisimo, Inc., Pittsburgh PA
Investigators
Abstract
This Small Business Innovation Research (SBIR) Phase I project will spin off NSF-sponsored basic research on knowledge discovery at Carnegie Mellon University computer science department. The result will be commercial software that can convey the contents of hundreds or thousands of documents on one computer screen with minimal clicking and scrolling. This capability will enhance information needs as diverse as search, overviewing, and browsing, and alleviate the problem of information overload, which today confronts all retrievers of computer-based textual information. The basic approach is a new form of conceptual clustering that emphasizes the human describability of the resulting document clusters. The techniques combine classical hierarchical clustering with results from the PI's research on data-driven knowledge discovery, which focused on generating very concise and contrastive descriptions of a large number of classes (here, document clusters). The overall goal is to replace the tedious long ranked list display of matching documents, which is nearly universal, but which forces users into repeated and inefficient clicking, backtracking, and scrolling. The potential market opportunities include any domain where more than a few dozen relevant matches are returned for typical information queries, such as web searches, news, patents, scientific research abstracts, proprietary corporate information, and, generally the content delivered by the numerous vendors of specialized information services.
View original record on NSF Award Search →