RIDIR: Modernizing Political Event Data for Big Data Social Science Research
University Of Texas At Dallas, Richardson TX
Investigators
Abstract
The project creates a general research platform to study civil protests, international conflict, and civil unrest using texts from Spanish, Arabic, and French, in addition to English. This expands the development of programs, data and services available for coding regional conflict and cooperation methods beyond the current English-only approaches to enable data-rich research that will advance new approaches to core questions in the social, behavioral, and economic sciences. The project includes an openly available website that allows for the extraction and reporting of conflict events across the globe as well as the identification of their causes and diffusion. The project's data and methods help make data-driven decisions about foreign policy, civil war prevention, human rights policies, and the effects of other factors such as environmental or economic policies on these phenomena. The project creates large-scale civil and inter-state conflict measures, covering multiple news sources and with a common methodology in an open framework. Using multiple news data sources reduces the biases inherent in coding from a single or small set of news sources, a common approach in the past. The project aims to facilitate the coding of more, and better, data across languages, space and time, thus facilitating the study of substantive questions in traditionally underrepresented countries, peoples, and topics. Further, usability considerations generate new software for the user interface for dealing with big data like that proposed in this research, as well as server-side optimizations that scale large datasets across a diverse set of users. The scale of the event data, covering multiple years and large-scale news databases, will generate many millions of observations over space and time. Research tools, data extraction, and other user interfaces are developed to allow the relevant research communities to have access to, queries of, and citation streams for these data. Finally, machine-coded data from news reports is validated across news sources, languages, actors, ontologies, and against human-coded gold standard records. The research and data serve as inputs for understanding the effects of climate on spatio-temporally referenced civil conflict events in Latin American, Africa, the Middle East, and worldwide.
View original record on NSF Award Search →