Collaborative Research: IIS-III: Small Towards Fair Outlier Detection

$312,933FY2023CSENSF

University Of California-Davis, Davis CA

Investigators

Abstract

Outlier detection is a common problem in machine learning and data mining, in which a collection of instances/records/objects is analyzed and the system identifies ones that stand out. It has the potential of being a controversial use of AI methods, as a typical outcome is to label an item/individual as being unusual, often with negative connotations. Outlier detection is used extensively in the context of fraud detection, surveillance, and policing in numerous domains. There are many outlier-detection algorithms, but they are typically not fairness-aware, meaning they could inadvertently discriminate against protected status groups or subgroups, which often stand out from the norm. This award addresses the problem of encoding fairness into various types of outlier-detection algorithms, both traditional data-mining based as well as modern deep learning based. Adding fairness to outlier detection will allow it to be used in a wider variety of tasks while ensuring that these algorithms do not discriminate. The project consists of three core tasks, to be evaluated on social media and medical imaging applications. The first task consists of defining how to measure fairness. The second task explores how to encode fairness for tasks such as auditing the output of an algorithm to identify unfairness and how to post-process the results of an outlier-detection algorithm to meet fairness requirements. Finally, the third task explores adding fairness to modern deep learning-based algorithms used for outlier detection. Incorporating fairness considerations into machine-learning algorithms is an important and relatively understudied problem—potentially due to the wide variety of algorithm types. This project explores how to include fairness mechanisms into a wide variety of outlier-detection algorithms. For more traditional outlier-detection algorithms, it explores auditing the algorithm to determine if the output is unfair and then minimally post-processing the output to make it fairer. Doing so will involve formulating these problems as discrete optimization problems that search for examples of unfairness and search for which instances to move between the outlier and normal classes to elevate fairness. For deep learning formulations of outlier detection, the project will explore directly encoding fairness into the training algorithms via a number of different strategies, with a core goal of determining which is the most appropriate and useful. A particular challenge for deep fair outlier detection is that outliers can be presented in several ways: i) using thresholds, ii) as an ordered list, or iii) as a score. The project will study all three settings. It will evaluate the first type of outlier detection on a social media platform for account and content filtering (with SNAP), and the last two types on medical imaging applications that employ outlier detection for data preprocessing (with UC Davis Medical Center). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →