GGrantIndex
← Search

CAREER: Finite-State Machine Learning on Strings and Sequences

$500,000FY2004CSENSF

Johns Hopkins University, Baltimore MD

Investigators

Abstract

This CAREER project aims to create a software infrastructure for statistically modeling sequence data. Users will be able to specify, train, apply, combine, and share models. For instance, to build a system for extracting information from text, or translating text into another language, one might combine several other researchers' models of various linguistic phenomena. The resulting composite model reflects expert knowledge in its structure, but it also has free parameters that can be trained on appropriate data. The technical approach is to represent statistical models with weighted multi-tape finite-state automata. Sequence data, sequence processing tools, and relational databases can also be represented in this format. All these resources can be efficiently combined, using a flexible regular-expression language, because this class of automata is closed under many useful operations. As part of building the software infrastructure, the project will investigate improved search strategies and training algorithms. Within the PI's specialty of language and speech processing, it will also develop some useful models. This project is designed to allow more people to succeed more quickly at building more accurate and efficient speech and NLP software for more domains and applications, thus lowering the barriers to entry in modern language and speech technology. Besides reaching out to communities through easy-to-use software and clear tutorials, the project will develop CS-specific course materials that gradually dig down to reveal the fundamental theory and algorithms.

View original record on NSF Award Search →