III: Small: Collaborative Research: Explaining Unsupervised Learning: Combinatorial Optimization Formulations, Methods and Applications

$234,999FY2019CSENSF

University Of Virginia Main Campus, Charlottesville VA

Investigators

Abstract

Clustering is a common machine learning and data mining technique which takes a collection of instances/records/things and divides them into groups. It is used in a large variety of domains including social networks (to find communities), biology (to create taxonomies) and neuroscience (to find regions of interest in the brain). There are many clustering algorithms already in existence, but these algorithms do not always explain the clustering. This award addresses the problem of describing the clustering algorithm results to a variety of stakeholders including data scientists, domain scientists and the general public. Explaining the results of these algorithms will allow them to be better understood by stakeholders and allow their use in challenging and sensitive domains where transparency is required. Explanations are given in an initiative form using easy to understand auxiliary information such as tags. This project will consist of three inter-twined tasks. The first will develop easy to understand mechanisms to explain a clustering, whilst the second will allow a human to interact with the explanation by asking queries about it. Finally the third task will attach measure of stability, trust and correctness to the explanations generated from task one. The area of unsupervised learning is immensely popular due to the lack of need for labeled data and there exist many algorithms that can work on a variety of data types: images, graphs, documents, spatial and temporal data. Many domains have a preferred/well-accepted clustering algorithm. However, most algorithms provide just a grouping of the instances/objects into clusters with limited description. The work on describing and/or explaining a solution has gained popularity in the supervised learning context but is under-studied in the unsupervised context. This award explores these novel explanation problems through discrete combinatorial optimization formulations. Such formulations help in developing explanations requiring interpretable (hence discrete) results, best possible explanations (not any plausible explanation) and in enforcing complex constraints to make explanations match human expectations. This research will leverage much work in theoretical computer science and use tools from declarative paradigms such as ILP solvers and constraint programming languages. Such tools allow for easy modifications of formulations, a desirable trait as different domains may need different variations in explanation. The usefulness of the techniques will be demonstrated through their applications to several domains including social networks and genomic data and evaluated by two domain experts in the area. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →