GGrantIndex
← Search

CI-P: Planning for a Multilingual FrameNet Lexical Resource

$99,825FY2014CSENSF

International Computer Science Institute, Berkeley CA

Investigators

Abstract

The FrameNet Project (http://framenet.icsi.berkeley.edu) at the International Computer Science Institute in Berkeley, California, has been building a unique dictionary of everyday English words, which connects words with semantic frames, each representing a type of event or relation and the roles of people or objects that participate in it. For example, the Kinship frame includes the words 'nephew', 'mother-in-law', 'sibling', etc. and the Leadership frame includes both nouns like 'corporal', 'bishop', 'CEO', and 'headmaster' and verbs like 'lead', 'preside', and 'rule'. Since 1997, the FrameNet project has developed more than 1,100 such frames covering more than 12,700 word senses, and manually annotated almost 200,000 examples showing how these semantic roles fit into the grammar of the sentences. The database is freely available and is being downloaded daily for use around the world, in NLP applications like question answering and reasoning about the causes and effects of events. Projects elsewhere are building FrameNet-style databases for many other languages, including Spanish, German, Japanese, Chinese, French, Brazilian Portuguese, Arabic, etc. These other projects have largely followed the English FrameNet example, using the same semantic frames and roles. Their conclusion has been that roughly 80% of the lexical units in the target languages fit nicely into semantic frames that were originally defined for English. But there has been no broad, systematic effort to align all these databases to produce a freely available unified multilingual frame semantic resource. This award is used to plan such an effort, to lay the groundwork for connecting all these FrameNets into one multilingual database, which will permit many new NLP applications, such as frame-based machine translation and recognizing when news accounts in two different languages are discussing the same event. During this planning project, the investigators are planning how to go about aligning the separate FrameNets, based in part on the frame names (other FrameNets either use the English frame names or translations of them) and in part on quantitative measures of the similarity of frames, lexical units, and semantic roles. The quantitative measures will exploit the networks created by frame-to-frame relations in each language, comparing not only individual frames, but also their neighbors in the network, their corresponding semantic roles, and the similarity of the fillers of corresponding roles across languages. In this way, correspondences will be established between the quite different syntactic constructions (valence patterns) in which these roles are realized across languages, similar to the methodology used in creating FrameNets in other languages by projection. The plan includes both face-to-face and virtual meetings to define community requirements and priorities and to gather researchers and developers' suggestions on alignment methods, application interfaces, etc. As a result of this project, the investigators plan to prepare a comprehensive proposal to the CISE Research Infrastructure Program to actually create the multilingual database, guided by continuing consultation both with the teams developing the various FrameNets and with those who are already using FrameNet in research and practical applications.

View original record on NSF Award Search →