Collaborative Research: RI: Medium: Learning Compositional Implicit Representations for 3D Scene Understanding

$409,999FY2022CSENSF

Cornell University, Ithaca NY

Investigators

Abstract

Scene understanding systems take visual inputs, like images or videos, and reconstruct and interpret the underlying scene in terms of 3D structure, objects like cars and people, and other scene properties. Such systems are crucial in applications in computer vision, computer graphics, and robotics, including in self-driving cars. To represent the 3D world as observed from the input imagery, such systems use mathematical models, and in recent years neural networks have been very popular as the models used in such systems, due to their expressiveness and ability to capture fine details. However, current neural network-based scene representations are only good at modeling the specific conditions under which a scene was observed, and cannot generalize to new scenarios, limiting their use in many applications. For example, if a self-driving car is trained to model scenes using only images from sunny days, the car’s perception system might break down on rainy or snowy days. This project aims to introduce new scene modeling techniques that will enable machines to perceive and reconstruct 3D scenes in a more generalizable way. The investigators will integrate findings from this research into course development and student advising, and partner with educational and non-profit organizations to teach AI, vision, and graphics to underrepresented students. In this project, investigators will explore new methods that will make representations capable of encoding more structure (e.g., light field) and root them in physics. Designing such representations requires knowledge from AI, computer vision, and computer graphics. The key innovations include a new class of scene representations that aims to bridge the ability of implicit neural representations to capture scene details with that of physical representations to model scene structure; new methods that infer the representation from raw images and videos with new parametrizations to enable data-efficient, self-supervised learning; and new methods that leverage the representation for downstream computer vision and graphics tasks, such as interactive design and scene synthesis. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →