III-COR-Small: Multi-Relational Data Clustering with Probabilistic Mixture Models
University Of Minnesota-Twin Cities, Minneapolis MN
Investigators
Abstract
With widespread attempts to apply data mining methods to real life problems, there is an increasing realization that real life data is often multi-relational, involving observations connecting multiple entities through a set of relations. A central problem in several applications involving multi-relational data is to simultaneously find clusters of objects across related entities, e.g., customer clusters and related product clusters in e-commerce, movie clusters and related user clusters in recommendation systems, communities and shared content in social networks, etc. The key novel aspect is that the clustering of objects in an entity, such as the set of movies or users, depends on its relationships with objects in other entities, e.g., users are similar if they like similar movies, and vice versa. The primary goal of of this project is to develop a unified statistical approach to multi-relational clustering and related problems in multi-relational data analysis. Towards this end, the project investigates a family of novel statistical multi-relational mixture models, with focus on additive and multiplicative models for multi-relational clustering. Due to modularity of design, both additive and multiplicative models can incorporate domain specific semantics as well as automatic model selection using appropriate Bayesian priors. Further, the project investigates efficient variational inference methods appropriate for discovering latent multi-relational clusters. The project significantly empowers the knowledge discovery component of data mining. Crucial clues to understanding observed data in several disciplines, including social, biological, and information sciences, are often spread across multiple related observations. The project enables a statistical approach to detecting latent structure in such multi-relational data, which is an important step towards knowledge discovery from multiple related data sources. The project plays an important role in developing closer collaboration across disciplines and broaden participation in computer science. Building on the increasing awareness regarding the ubiquity of multi-relational data, the project contributes to the development of appropriate educational material for the next generation work-force. Further information on the project may be found at the project web site: http://www.cs.umn.edu/~banerjee/multi-relational.
View original record on NSF Award Search →