CI-NEW: Multilingual FrameNet: A Resource Enabling Cross-Lingual Research for the Natural Language Processing Community

$607,574FY2016CSENSF

International Computer Science Institute, Berkeley CA

Investigators

Abstract

The FrameNet lexical semantic database records the meanings of words (and multi-word expressions) in everyday English; it is a sort of "super dictionary" that is both human-readable and machine-readable. This database is based on the fact that individual words can evoke and entire situation in our minds, complete with roles for people and things that participate in the situation. For example, the word hire evokes the situation of Employment, with roles for the Employer, the Employee, the Position, etc.; both the word vengeance and the expression get back at evoke Revenge, with the roles Avenger, Injured party, Injury, Offender, and Punishment. These situations are called semantic frames, and the project is guided by the theory of Frame Semantics, developed by the late Prof. Charles J. Fillmore of UC Berkeley. The FrameNet lexical database currently includes descriptions of more than 1,000 semantic frames, more than 13,000 senses of words and expressions (called Lexical units), and more than 200,000 manually annotated examples which show how the various roles are expressed by different parts of a sentence. The FrameNet database is widely used in natural language processing; it helps engineers create software to analyze written texts into semantic frames and participants, so that computers can reason about the situations described. Thousands of researchers and companies are already using such software for applications such as automatic analysis of reports from combat or natural disaster situations, understanding financial news reports, recognizing expressions of opinion on blogs and product websites, and searching clinical records and medical research reports. Although the frames were mainly created for English, most of them have been shown to be useful for other languages as well, and researchers around the world are now creating FrameNet databases for many other languages. The Multilingual FrameNet project will align the databases for different languages, both at the level of semantic frames and at the level of lexical units. The aligned database will help to improve applications such as foreign language teaching, cross-linguistic information retrieval, and machine translation. The new project also includes setting up a website and software so that teachers and students everywhere can participate in the project by adding to English FrameNet, creating a more complete and more useful FrameNet for its many users.

View original record on NSF Award Search →