EAGER: Prototyping an Urban Data Cyberinfrastructure for Computational Social Sciences
University Of Chicago, Chicago IL
Investigators
Abstract
Cities are the crucibles of civilization, and accelerating global urbanization raises challenges and opportunities related to density and scale in areas including transportation; food production and distribution; human health and wellbeing; education; social policy and services; and management of water and energy. Seeking to understand the human, social, and economic to help develop effective education or public policy scientists, and thus city officials, have traditionally been limited to qualitative studies or to using sparse, often stale data sources. The open data movement is making an increasingly rich set of urban data available, but the cyber infrastructure technologies and tools used to make this data available were designed primarily to support the analysis of individual data sets rather than exploring relationships among many data sets. Consequently, urban scientists from sociology, economics, behavioral sciences, education, engineering, operations research, and other disciplines lack the tools and infrastructure to fully harness urban data for their research. The questions these researchers ask are therefore constrained by the data they have in hand. Two new cyber infrastructure capabilities have potential to unleash these data sources, both exploiting the fact that most of the published urban data sets share the attributes of location and time. The first is to allow a scientist to assemble data from multiple, independent, data sources for a specific geographical location point (latitude/longitude), city unit (street segment, census tract, block), or area (polygon). The second is to select a window of time and to normalize the selected data sources using a common sampling interval, merging them into a composite structure for computational and statistical analysis. Taken together, these capabilities will allow a scientist to study urban areas, over specific time periods, with varied, relevant data represented as a time series of vectors. We propose to develop, in partnership with urban scientists and City officials initially from Chicago and eventually from New York City, a proof-of-concept with these capabilities. The prototype will draw data from open data portals, allowing a researcher to specify a location, a window of time, a sampling period, and a list of data sets. The system will provide a matrix with one row per time sample and columns representing each data set. By merging and transforming urban data into matrices we will enable urban scientists to apply the tools of mathematics and computation to understand urban challenges ranging from youth violence and crime to graduate rates to employment and economic decline and revitalization.
View original record on NSF Award Search →