Speech Across Dialects of English (SPADE): large-scale digital analysis of a spoken language across space and time

$199,791FY2017SBENSF

North Carolina State University, Raleigh NC

Investigators

Jeff Mielkecontact Erik Thomas Paul Fyfe Robin Dodsworth Tyler Kendall

Abstract

This project focuses on new and fast ways to analyze speech across dialects--here, dialects of English, but the process can eventually be used for any language. The researchers take methods from computer science and put them to work with tools and methods from speech science, linguistics and digital humanities to find out objectively how much the sounds of English dialects across the Atlantic vary now and in the past. Scholars across the humanities and social sciences already routinely analyze huge amounts of written English quickly and easily. In fact, the electronic tools for searching texts and, within seconds, obtaining summaries that one can see are available to anyone. However, speech research is only now entering its own "big data" revolution. Past linguistic research tended to carry out detailed analyses of a few aspects of speech from one or a few languages or dialects. The current scale of speech research studies has shaped our understanding of spoken language and the kinds of questions that we ask. Today, massive digital collections of transcribed speech are available from many languages, gathered for numerous purposes: from oral histories to large datasets for training speech recognition systems to legal and political interactions. Sophisticated speech processing tools exist to analyze these data, but they require substantial technical skill. Combining these data and tools allows linguists to answer fundamental questions about the nature and development of spoken language. This collaborative project seeks to establish the key tools to enable large-scale speech research to become as powerful and pervasive as large-scale text mining. This ability to quickly and easily analyze speech in many English dialects should be applicable in computational, forensic, and clinical approaches to speech, and forensic and clinical speech applications, and useful to literary scholars, sociologists, anthropologists, historians, political scientists. What is learned from this project will be shared with the public through an interactive sound mapping website. This award was made as part of Round 4 of the Digging Into Data Challenge, an international funding opportunity designed to foster research collaboration across countries and to encourage innovative approaches to analyzing large data sets in the social sciences and humanities. The U.S based researchers will collaborate with scholars in Canada and the U.K. to achieve the goals of this project.

View original record on NSF Award Search →