III: Small: Social Discovery of Users and Content in Social Media Through Similarity-Based and Graph-Based Inference of Attributes and Queries

$500,000FY2016CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Abstract

This project aims to develop graph-based tools to discover the people and content to pay attention to in social media that are most relevant to a given question or goal. This is an important problem for applications including marketing, event detection, civic participation and governance, and disaster management, and a hard problem because there is so much content and so many people but not enough information about their attributes and attitudes to make good choices about who to listen to. The research team proposes to label people and content using similarity or generalized "homophily" principle, the idea that the closer people are to each other (like friends or neighbors or Twitter followers), the more likely they are to have things in common. The research team will develop SocialSense, a tool that uses this idea to guess labels for people and content based on how they are connected to their neighbors; these inferred labels will allow users to create more complex and accurate queries in social media. The team will work with existing partners to develop SocialSense and validate that it does better than current social media tools; they will also use the tool to support undergraduate and graduate classes around web search and databases. For the problem of finding users, the team will represent users and content as nodes in a large social graph, where each of these and the edges between them has a set of demographic and attitudinal attributes. This project develops novel algorithms for graph-based learning/mining over social graphs as well as content graphs. Through mining patterns of connection in the network, the team will identify a set of structural motifs of homophily and use those motifs, as well as underlying probabilities of the occurrence of attributes, to propagate inferences about those attributes to other nodes, and check the quality and fairness of those inferences using a rejection sampling technique. For finding content, the team will represent content and queries in a graph and again mine common patterns, this time to create query templates that will support the creation of future queries as new topics and entities arise. Finally, the team will integrate these components, creating a system that supports querying across people and content and suggests interesting new queries based on discovering patterns of connected attributes that match the motifs and templates described above. The team will evaluate the methods and system through both offline back-end performance measurements and online deployments that evaluate usability, expressiveness, and simplicity of the systems in the context of their partnerships with a smart nation/citizen input project and a social mapping cloud service run by their institution.

View original record on NSF Award Search →