EXTREEMS - QED: Directions in Data Discovery (Data Cubed) in Undergraduate Education
University Of Colorado At Boulder, Boulder CO
Investigators
Abstract
The Data Cubed project will prepare students for the challenges posed by the analysis of large datasets. As governmental, scientific, and business enterprises collect, store, and process more data, many technological challenges are encountered. The analysis of big datasets requires a collaborative effort between mathematicians, statisticians, computer scientists, and domain experts. Computation (including algorithmic development), modeling (including dimensionality/complexity reduction), and visualization are all needed. The Data Cubed project will identify talented students early in their academic career and give them appropriate mentoring and increasingly advanced statistical and computational coursework. The students will proceed to data discovery research under the guidance of faculty members and partner scientists. The ultimate goal of the Data Cubed project is to increase the number of highly qualified undergraduate students who are able to apply their skills as they enter the scientific workforce and data analytics careers and to share the results of this project with the broader community. Students will learn mathematical and statistical techniques and software systems to collect, generate, store, analyze and visualize large amounts of data. In the Data Cubed project, several new courses will be created to train students in the core computational and statistical areas that underpin the analysis of large datasets; the students will be provided with significant research opportunities in the areas of geophysical modeling, analysis of unstructured social media data, and dimensional reduction techniques and modeling. One of the research projects will use large geophysical datasets from the National Center for Atmospheric Research (NCAR) and will involve modeling heat stress in urban environments and its relationship to public health. Another will examine the role of oceans as a primary reservoir of heat for our planet, which plays a significant role in the dynamics of climate change. Yet another project will combine heuristics of social media data (e.g., tweets) during times of mass emergency with a user's social graph to develop a more comprehensive picture of the situation. These and other projects all require fundamental knowledge and understanding of how to analyze large datasets.
View original record on NSF Award Search →