REU Site: Animal Language Processing and Understanding

$483,804FY2024CSENSF

University Of Texas At Arlington, Arlington TX

Investigators

Abstract

What animals are talking about is a fascinating research topic. Previous attempts to understand animal communications relied on behavioral observations, and have studied only a handful of species given high costs. None of these attempts have adopted a computational and systematic approach to associate animal vocal sounds to written symbols and meanings as we typically do with human languages. This project investigates whether animals have languages that are syntactically and semantically compositional like humans, using current advanced machine learning methodology. In a preliminary study, the team has developed an "animal language processing" pipeline that cleans and preprocesses a large amount of YouTube videos about domestic dogs, and further transcribes the audio segments of dog vocalizations into a sequence of phonetic symbols. In this project, undergraduate researchers will collect high-quality, partially annotated multimedia animal communication data, feed it into the aforementioned pipeline, analyze the resulting transcripts and gain new insights on the languages of new species. The work conducted by the undergraduate researchers will also develop into an open-source web-based animal language study platform called AniVoice. The outcome of this project will be a step toward answering the big science question noted above, and eventually help better understand the world around us. This project will also support the undergraduate participants in developing a expertise in interdisciplinary research. This project includes technical milestones. First, the researchers acquire high-quality, cleaned vocalization data of a target animal species, partially annotated with the activity or scenario the animal is involved in. Data is collected either by live recording, or downloading relevant video clips from the internet. Second, the researchers segment the audio files of animal communications into minimum units of audible sounds, much like the syllables of human speech, and then group regular animal syllable sequences into animal "words". This poses significant challenges because human beings do not know how to parse animal communications. Third, the researchers develop models to automatically understand the semantics of the words by looking at the activities or location of the animal when a certain word is uttered. This requires the implementation of video understanding algorithms, and in particular, scene and activity classification, as well as "active speaker" detection in a scene where multiple animals exist. Finally, the undergraduate researchers implement a web-based crowdsource interface where animal lovers or researchers can upload their recordings, annotate their data using the tools provided, and evaluate the transcription and semantic analysis results from the animal language processing pipeline. The AniVoice platform will be used as a tool to advance animal language studies in the future. Each year of this project, a cohort of undergraduate researchers will contribute by continuously improving the processing pipeline, adding more experimental evidence on more animal species, and improving the AniVoice web service. The analysis performed by the student researchers is expected to provide additional scientific evidence of the language abilities of target species, which may lead to new insights by biologists and behavioral ecologists. AniVoice opens opportunities for data-driven and interdisciplinary research collaboration on animal communications. The knowledge gained in this project may also be useful in the study of ancient or extinct human languages. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →