Collaborative Research: HCC: Medium: Linguistically-Driven Sign Recognition from Continuous Signing for American Sign Language (ASL)

$405,106FY2022CSENSF

Trustees Of Boston University, Boston

Investigators

Abstract

There is currently no method to segment and recognize signs from videos of continuous signing. In this project, computer-based sign recognition techniques developed for American Sign Language (ASL) will be extended from isolated, citation-form signs to signs in sentences. Sign recognition from continuous signing is a challenging task, in part because the articulation of one sign is often affected by that of neighboring signs. Much of the recent deep-learning research has focused on short videos of continuous signing using an “unsegmented” approach, in which the words in a video are identified without detection of start and end times for each sign. However, ASL utterances typically contain signs of different types (e.g., lexical, fingerspelled, classifier constructions) that have significantly different internal structures, requiring distinct recognition strategies. Thus, “segmented” recognition to identify video sub-durations containing distinct sign types is necessary for a complete recognition architecture. Segmented recognition would also enable automatic time-stamped annotation of ASL videos to produce training data for AI research, as well as powerful tools to provide ASL learners (who often have trouble parsing continuous signing) with a word-segmented analysis of videos. The new methods and all data collected for this project will be shared publicly, facilitating new research in computer vision, graphics, HCI, ASL, linguistics, and other related sciences. The research will also pave the way for a wide variety of technologies to improve communication between deaf and hearing individuals, such as: ASL-to-English translation (for which sign recognition is a precursor); educational applications to support ASL learners; and Google-like sign search by example over videos on the Web. Additional broad impact will derive from project outcomes because these same technologies can be applied to other signed languages, and because the new methods will be incorporated into educational programs in the PIs’ institutions. Beyond these societal impacts and benefits for research and education, the PIs will continue their long tradition of recruiting students who are deaf or hard-of-hearing, as well as members of other underrepresented groups, to participate in this research. This project will develop a novel end-to-end machine learning approach with the following key components: (1) segmentation of regions from continuous signing that contain distinct types of ASL signs (based on 2D skeleton data extracted from a window of a specific number of frames, using AlphaPose); (2) segmentation of the individual signs within those regions (using a spatiotemporal GCN approach); and (3) recognition of the segmented signs (using a transformer for sign classification). The recognition of segmented signs will leverage linguistic constraints applicable to the recognized sign type, with a focus in this project on lexical signs, and will also consider coarticulation effects. To these ends, the research will build on the PIs’ large, publicly shared, linguistically annotated, video corpora of isolated signs and continuous signing. This approach, with explicit sign type detection, sign segmentation, and type-specific recognition strategies for segmented signs, is essential for successful continuous sign recognition at scale and for a wide range of applications. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →