RI: Small: Panoptic 3D Parsing in the Wild

$500,000FY2021CSENSF

University Of California-San Diego, La Jolla CA

Investigators

Abstract

Humans have the remarkable capability of recognizing/understanding 3D objects and scenes, due to the use of effective representations (yet not fully understood) that encode the intrinsic 3D world for the 2D projections. One of the main objectives in computer vision is to develop systems that can "see" the world. This project points to a new direction, panoptic 3D parsing (Panoptic3D), that jointly performs semantic segmentation, object detection, depth estimation, 3D shape reconstruction, and 3D layout estimation for single-view RGB images of natural scenes. The rapid development in 2D and 3D image modeling, representation learning, deep models, as well as large-scale cross-modality datasets provides an unprecedented opportunity to building the Panoptic3D systems. The Panoptic3D system can be adopted to offer assistance to scientific studies and experiments in other disciplines beyond computer science such as cognitive science, neuroscience, health-care, transportation/civil engineering, mechanical engineering, and computational biology. This project highlights a roadmap to building a novel system, Panoptic 3D Parsing (Panoptic3D), that jointly performs semantic segmentation, object detection, instance segmentation, depth estimation, 3D shape reconstruction, and 3D layout estimation for single-view RGB images in the wild. The problem of image understanding and 3D (shapes and layout) reconstruction for single-view image is deeply rooted in decades of development in computer vision and photogrammetry. The project is inspired by the recent development in holistic image understanding and single-view 3D shape/layout reconstruction, the availability of large-scale 2D/3D image datasets, as well as successes in deep learning and representation learning. A number of technical innovations will be made by developing new 3D modeling and computing algorithms when combating the issue of absent comprehensive sets of multi-modality ground-truth annotations for segmentation/objects/3D shapes/3D layout of natural images in the wild. The potential gain of pursing this new direction is substantial and the proposed Panoptic3D system is applicable to a range of domains including computer vision, computer graphics, autonomous driving, mapping, robotics, human-computer interaction, and augmented reality. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →