RI: Small: Inferring Non-Rigid Geometry from Object Categories

$460,000FY2015CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

This project integrates new theoretical developments in group sparse coding and non-rigid structure from motion (NRSFM) within model-based methods for computer vision. Geometry is at the heart of visual perception. Humans invert the procedure of 3D to 2D projection effortlessly, blissfully ignorant of the mathematics required to make such inversion possible. Computer vision has been striving to unlock these mathematical secrets for the past few decades, with the view that to create any machine that truly "sees" it must be able to perform a similar inversion from 2D to 3D. Inferring the camera position and the 3D structure of a scene/object from an ensemble of 2D projected points is known within the field of computer vision as structure from motion (SFM). By definition a static 3D structure is rigid, however, the set of 3D structures with the same object category label is inherently non-rigid; making large-scale NRSFM crucial for model-based category classification and detection. Model-based methods for object category classification and detection attempt to understand the interplay between an object's projected photometric appearance and its underlying geometry. These methods, however, have largely been abandoned in computer vision over the last two decades in favor of methods that rely solely on appearance (i.e. view-based approaches). As the space of computer vision and robotics continues to merge it is becoming increasingly important to not only recognize an object, but also understand how to grasp or interact with it - a task much more suited to a model-based methodology. Further, as the space of augmented reality becomes more sophisticated it is clear that 3D understanding of a scene/object is crucial - something that model-based approaches to perception naturally provide. Finally, vision machines are demanding an increasingly deeper understanding of how the visual world is allowed to vary during learning. A model-based framework can naturally accommodate this type of 3D geometric variation within a learning framework.

View original record on NSF Award Search →