CAREER: Unified Model-agnostic Interpretation Framework for Deep Predictive Models

$440,898FY2023CSENSF

George Washington University, Washington DC

Investigators

Abstract

Deep learning models have achieved exceptional predictive performance in a wide variety of tasks, ranging from computer vision to language processing to medical images. Many organizations across diverse domains are now building large-scale applications based on deep learning. However, there are growing concerns, regarding the fairness and trustworthiness of these models, largely due to the opaque nature of their decision processes. For example, when the trained deep learning model correctly classifies a tumor in a target medical image, the part of the X-ray image that the model learned to identify the tumor in must be understood to ensure the findings are valid. Providing accurate and reasonable interpretations is therefore urgently needed for selecting and deploying trustworthy deep learning models. This project will design and develop a universal interpretation framework that can be applied to a variety of fields for deep learning applications. The interpretation framework can produce feedback on what scientific knowledge is perceived by the Deep Neural Networks (DNNs) and hence helps researchers refine models by identifying, minimizing, or even eliminating unfairness and bias. This project will also spend significant efforts on education activities, focusing on three key areas: (1) professional development for K-12 teachers, (2) deep learning summer camp for high school students, and (3) mentoring undergraduates for research. These educational and outreach activities will build bridges among high school students, K-12 teachers, and colleges that will eventually benefit both science and society. This project will develop a novel interpreting framework that enables humans to understand the decision process of increasing complex black-box DNNs trained on medical images, videos, natural language processing and deep reinforcement learning. Although progress has been achieved on DNN interpretation, several unique challenges remain unexplored for the aforementioned domains: (1) 3D medical images, which are highly structured and usually require domain knowledge and are difficult to explain. (2) Video interpretation cannot be achieved by simply applying existing image interpretation methods. (3) Most existing NLP interpretation models require certain knowledge of the internal structure of the neural networks. (4) Current DRL interpretation heavily relies on decision trees imitating action samples, which cannot guarantee to minimize policy regret. This project will address these challenges in the following ways: (1) Interprets 3D medical images by a novel graphical representation to create correlated interpretations among neighboring slices of 3D images; 2) Devise saliency estimating procedures for video-based tasks in both spatial and temporal domain; (3) Designs a novel text perturbation scheme via embedding space to identify important words of NLP models; (4) Interprets agent's behaviors and elucidates the strategies that agents learn to balance short-term and long-term reward; (5) Develops an all-encompassing interpretation framework to provide interpretations for arbitrary deep learning models through a series of pilot applications. The specific research tasks will be extensively evaluated in trustworthiness, performance comparison, and interpretability to human beings. All the research outcomes will be disseminated publicly to facilitate a better understanding of explainable deep neural networks. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →