Workshop: Developing collection management tools to create more robust and reliable linguistic data

$123,216FY2016SBENSF

University Of Hawaii, Honolulu

Investigators

Abstract

The world's linguistic and cultural diversity is encoded in the approximately 7,000 distinct languages spoken across the world. With many of these languages currently endangered or threatened, the creation of an enduring record of these language is of paramount importance. Endangered language documentation includes many elements, including raw audio and video recordings, photographs, transcription files, databases, files containing linguistic analysis and other research details, responses to experimental stimuli, and field observations. Together these files make up a collection of interlinked data for a particular project. For example, recordings go along with their transcriptions, and data in the transcriptions is added to databases. Managing all these kinds of data is necessary before archiving and making the data widely accessible. Researchers in the language sciences manage a large amount of interlinked data prior to depositing it in an archive. However, there are no guidelines for best practices for this type of collection and there are no standard tools for managing the files. As a result, current practices are inefficient and create bottlenecks that delay archiving. This project will use workshops to bring together stakeholders in language documentation, including software developers, to develop standardized software tools to address the hold-ups that have the potential to prevent research products from being properly archived and thus publicly accessible. The workshop series proposed here addresses this obstacle by developing standardized tools for management of linguistic data collections. Such tools will facilitate a more robust and reproducible science of language by providing researchers with standard methods to manage data from the point of collection to the point of archive deposit. The aim is to eliminate the collection management bottleneck and to facilitate greater uptake of language archives. The workshop series will bring together relevant stakeholders including: field linguists who collect data; theoretical linguists who make use of archival linguistic data; experts in data curation; and software developers. In order to encourage broad participation, the three workshops will be scheduled in conjunction with major gatherings of linguistic researchers, including the Linguistic Society of America annual meeting. The outcome of these workshops will be a sustainable plan for development of a cross-platform, open source collection management tool. By making data more accessible and better described this tool will facilitate increased reproducibility and accessibility of linguistic research. This greater availability of primary language resources will transform not only various subfields of linguistics, but also related fields such as anthropology and social psychology, which rely on careful management of field data. Further, by taking a stakeholder-driven approach via a series of workshops, the project has the potential to encourage broad adoption of collection management tools by both the language documentation community and by linguists representative of other subdisciplines. In doing so, the project will decrease the barriers to proper description and archiving of linguistic data of a wide variety. Moreover, by improving the dialogue between language documenters, language archivists, linguists and developers, this project will serve as a model for the development of software in linguistics, as well as other social and behavioral sciences.

View original record on NSF Award Search →