SBIR Phase II: Representation and Deep Learning for Free Text Applications

$943,539FY2015TIPNSF

Textician, Llc, Cambridge MA

Investigators

Abstract

The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase II project derives from an enhanced capability for automated processing of free text and other structured data. Motivation for this approach comes from neural networks and, in turn, it has applications to neural modeling and our understanding of how the brain processes information. In the software industry, commercial innovation continues to revolve around automated processing of web pages, which plays a key role in creating many new companies. Therefore, the ability to automate decision-making from free text is increasing in importance. A better way to represent text for use with machine learning will open new capabilities wherever the structure of sentences must be taken into account. This has the potential to lead to new startup ventures, thereby resulting in new products and services. A successful project will result in platform technology that can provide a substantial competitive edge to companies that take advantage of it, provide new and better capabilities for consumers, and help advance the nation's lead in technological innovation. This Small Business Innovation Research (SBIR) Phase II project seeks to further develop new ways to process textual material so that computers can better learn applications related to natural language. Applications include sentiment analysis (assigning either positive or negative views to a body of text), summarization of documents, and classification of documents using multiple labels from a fixed set of many classes. The project will further develop new techniques to improve performance, and prototype components for transforming text and applying computerized learning methods. The new techniques represent words simultaneously with document structure using a single high-dimensional vector (for example, a list of 1,000 numbers). The project is aimed at improving computational capabilities involving documents, web pages, and other text as well as providing new techniques that can be applied to automated translation, better computer understanding of images, and genomic information.

View original record on NSF Award Search →