CAREER: Exploiting Deep Generative Models for Visual Recognition

$462,567FY2023CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

Modern visual recognition systems have achieved impressive results on standard benchmarks and work reliably for common objects and scenes, given massive data and annotations. Unfortunately, current systems struggle to detect rare or unseen objects and fail to adapt to new domains. Researchers, engineers and/or domain experts have to capture and annotate huge amounts of real data, which are costly for common objects and impractical for rare objects and corner cases (i.e., cases that occur when multiple unique conditions simultaneously occur). To address the above challenges and automatically create and label data that fully depict the corner cases, this project leverages the rich compositional structure and powerful synthesis capacity of large-scale generative models. By using these models that can quickly synthesize diverse objects and scenes with an unknown visual elements (e.g., new poses, weather, lighting, etc.). This project will develop recognition algorithms that can recognize rare/unseen objects to adapt to continuously changing environments. This project has a potential to be transformative for various applications, such as autonomous driving, assistive robots, healthcare, e-commerce, and mixed reality. Furthermore, this research will translate to code, models, courses, and tutorials, that are widely accessible to diverse stakeholders and education and research programs that engage with the broader community. Directly using generative models is challenging, as it is highly unlikely that a randomly sampled image will cover a corner case that can improve recognition systems. To synthesize data that more closely resemble the long-tail distribution and new domains, this project will focus on three research thrusts. First, the project addresses learning visual recognition via generative models by exploring different methods of automatically generating data and annotations. Second, the project will analyze visual recognition systems through generative models by synthesizing diverse, continuously evolving test data to interrogate the system and understand the biases. Finally, the project will automatically select and adapt generative models to new domains and tasks. These three thrusts are tightly connected, as once the algorithms identify hard examples that fail our current system, these examples can be used to close the loop between training and analysis. Finally, investigators will evaluate the developed method by comparing methods with or without using generative models. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →