III:Small: Outlier Discovery Paradigm
Worcester Polytechnic Institute, Worcester MA
Investigators
Abstract
Staggering volumes of data sets collected by modern applications from financial transaction systems, smart health sensors, and Internet of Things devices contain critical insights from rare phenomena to anomalies indicative of financial fraud, health alerts to system failure, respectively. To decipher the valuables from the counterfeit, analysts need to interactively sift through and explore the data deluge. By discovering anomalies, analysts may detect financial fraud, identify behavior irregularities, or prevent catastrophic sensor failures, thus touching the lives of citizens in countless ways. While a treasure trove of stand-alone algorithms for detecting particular types of outliers exists, they tend to be variations on a theme. This research project is game-changing in that it will offer the first end-to-end outlier services that bring this wealth of algorithms to bear in an integrated infrastructure to support effective anomaly discovery. The broader impact of this project also includes: the integration of the PI's project activities with the training of a STEM workforce; impacting the PI's WPI REU data science summer site; and impacting the new interdisciplinary degree programs from PhD, MS to BS in Data Science spearheaded and led by the PI. The PI has a long history of working with diverse student populations at all levels and is determined to similarly foster diversity of the participants involved in this project. This research will go well beyond developing yet another outlier detection algorithm by instead demonstrating the feasibility of outlier discovery as a service. It will break fundamentally new ground in supporting outlier discovery from identification, refinement to explanation. The proposed end-to-end anomaly discovery paradigm will support all stages of anomaly discovery by seamlessly integrating outlier-related services within one integrated platform. The result is a database-system inspired solution that models services as first class citizens for the discovery of outliers. It integrates outlier detection processes with data sub-spacing, explanations of outliers with respect to their context in the original data set, user feedback on the relevance of outlier candidates in the domain, and metric-learning to refine the effectiveness of the outlier detection process. Evaluation using outlier benchmark data sets and real-world data sets and workloads explored in partnerships with collaborators from industry will be conducted to establish the utility of the innovation. The resulting system will enable the analyst to steer the discovery process with human ingenuity, empowered by near real-time interactive responsiveness of the platform during exploration. Our solution aims to be the first to achieve the power of sense-making afforded by outlier explanation services and human feedback integrated into the discovery process. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →