GGrantIndex
← Search

Link Mining and Discovery

$963,167FY2003CSENSF

University Of Maryland, College Park, College Park MD

Investigators

Abstract

This project examines several statistical inference tasks for data best described as a collection of heterogeneous linked objects. For each task, models are proposed, algorithms for both learning and making inferences using the models are developed and extensive empirical evaluation is performed. New approaches to mining linked data are needed because traditional statistical learning algorithms often make independence assumptions that are inappropriate for linked data. The first task studied is predicting the classification of an object based on attributes of the object and on a description of the objects to which it is linked. We are developing efficient collective classification algorithms that take into account the dependence between the object classifications. The next task examined is the use of labeled and unlabeled data for link-based classification. The use of unlabeled data to improve classification performance in propositional domains has received considerable attention in recent years; the use of unlabeled data for prediction in linked data is even more interesting, as it gives information about both the object distributions and the link distributions. The third task examined is link-based cluster analysis, i.e. finding similar objects based on both the object attributes and link properties. Results from this work will provide insight into effective statistical analysis for large linked heterogeneous domains. This has applications for discovering patterns in social networks, including criminal and terrorist networks, biological data such as epidemiological studies and a wide range of other collections of heterogeneous linked data.

View original record on NSF Award Search →
Link Mining and Discovery · GrantIndex