GGrantIndex
← Search

CAREER: Teaching Machines to Recognize Complex Visual Concepts in Images through Compositionality

$149,760FY2021CSENSF

University Of Virginia Main Campus, Charlottesville VA

Investigators

Abstract

Modern computational systems for image recognition can be taught to detect objects among large sets of categories. However, in order to teach machines to recognize every new category, human operators need to annotate a large number of images with categorical labels. In practice many applications require a custom set of categories. For instance, a visual recognition model for detecting different types of furniture for an e-commerce application might require very specific categories such as ‘rocking chair’, ‘swivel chair’, ‘accent chair’, or ‘swivel accent chair’. Even an expert domain user that has a good idea in mind for what should be the visual characteristics that are important to recognize in each type of chair, would have to teach the system through annotating images individually. The goal of this project is to enable richer modes of interaction where ‘machine teachers’ would be able to guide the image recognition through direct feedback on the types of visual characteristics that are important for each new category. To this end we plan to exploit principles of compositionality where new categories can be defined based on basic concepts that are easier to recognize. The project will integrate research with the education and involve undergraduate students from underrepresented groups in the research. This project will devise new models that learn to recognize visual concepts compositionally by first discovering and then learning to recognize visual primitives that are shared across many classes. This process will also be tailored to maximize the utility in an environment where a user can guide the model through natural interactions including the use of language and direct manipulation through a visual interface. The project will be 1) developing methods to compositionally and interactively learn from textual descriptions 2) proposing methods to automatically discover primitives that are composable across categories, and 3) proposing models that can support interactions even after deployment. These three research aims will be complemented by a comprehensive evaluation plan, a public platform that exposes our methods in an interactive environment, and broadening participation activities. This research effort will bring novel designs in visual recognition models that offer people more expressive ways for guiding them and training them. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →