RI: Small: Exploring Rationale behind Visual Understanding: Combining Attention and Reasoning

$285,029FY2019CSENSF

University Of Minnesota-Twin Cities, Minneapolis MN

Investigators

Abstract

Recent progress in deep learning has resulted in models that show significant performance gains in computer vision tasks. This project aims to bridge the current gap between the increasing performance in intelligent systems and the lack of understanding in the complex task-solving process. With the overarching goal of understanding and modeling the process, this project studies two intertwined mechanisms heavily involved in task-solving -- attention and reasoning -- and develops a sound framework to integrate the two. It will serve as a critical step forward to untangling the process of solving a visual task and alleviating the black-box problem in machine learning. The research will build attention and reasoning capabilities into machines, thus empowering applications in a broad spectrum of artificial intelligence tasks including medical diagnosis and treatment, robotics, and education. The principal investigator will organize workshops and seminars, and make project results publicly available. The project also aims at integrated research and education with a focus on increased diversity, through K-12 outreach activities, student mentoring, and curriculum development. This project focuses on both dataset and model development, as well as enabling new methods for network visualization, interpretation, and diagnosis. More specifically, the project develops: (1) a new dataset with human eye movements and textual explanations, to understand critical factors that contribute to task performance; (2) a framework where models devised in the framework make a first step to demonstrate the process of task-solving by showing attention and reasoning capabilities; and (3) a novel layer-wise network diagnosis method considering both performance and interpretability of each network layer. Addressing these questions will not only boost model performance but open the black-box of the decision-making process of a visual task as well as the structure of the deep neural networks. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →