III: Small: Multi-Dimensional Structuring, Summarizing and Mining of Social Media Data
University Of Illinois At Urbana-Champaign, Urbana IL
Investigators
Abstract
Various kinds of social media have impacted billions of users on their ways of obtaining and sharing information across the globe. This creates great opportunities but also poses tremendous challenges on understanding, summarizing, and mining of such data due to its huge volume as well as dynamic and unstructured nature of its text contents. In response to such challenges, this project focuses on text-based social media, proposes a multi-dimensional data structuring approach, which mines unstructured social media data to uncover its hidden multi-dimensional structures. The project investigates principle, methodologies and algorithms for social media structuring, summarizing and mining, and develops effective and scalable technology for multi-dimensional social media data analysis. The principles and methodologies developed in this study can be extended to scalable and multi-dimensional analysis of other kinds of massive unstructured data as well. To conduct effective multi-dimensional social media structuring, this project develops a distant supervision-based methodology with minimal effort of human curation and labeling. It takes data in Wikipedia, Freebase, or other knowledge-bases as references, integrates social media data with the corresponding news or other relevant documents, conducts phrase mining, entity and event discovery and typing, and uncover critical aspects, attributes, and values associated with such entities and events from social media. By organizing social media data in a structured way, massive social media can be summarizing effectively in a context-aware semantic OLAP (online analytical processing) framework and can be analyzed systematically under a general multi-dimensional social media querying and mining framework for many tasks, such as modeling behavioral patterns and uncovering bursty events and detecting social frauds or anomalies.
View original record on NSF Award Search →