RI-Small: Multi-level Priors for Multi-view Stereo

$316,000FY2008CSENSF

University Of Washington, Seattle WA

Investigators

Abstract

Humans are remarkably good at perceiving shape, even in cases where the image cues alone would seem to be insufficient. For example, up to certain distortions, people can infer the structure of a scene from a single photograph. Similarly, they can often estimate the shape of surfaces that are only partially visible, such as a chair that is half-occluded. These remarkable abilities are due to the human capacity to combine visual cues with prior information about the shapes of objects and surfaces in the world. In computer vision, modern multi-view stereo algorithms that exploit very low-level cues now produce shape models that are proving to be nearly as accurate as laser scanners and are doing so in very unconstrained settings, e.g., using photos from Internet sharing sites. However, these algorithms lack the other piece of the puzzle --- the ability of the human visual system to exploit prior information about scene shape. In this work, the PIs focus on the particular domain of architectural scenes for which prior notions of shape are particularly applicable. Existing priors typically fall into two categories: low-level, usually a preference to reconstruct smooth surfaces, and high-level, such as model-based techniques that have parameterized templates for specific architectural features. The PIs are exploring a range of priors between these extremes, significantly increasing the expressiveness of low-level priors and defining a set of mid-level priors. The key ideas are to consider the differential properties of architectural surfaces (e.g., curvature behavior) and to exploit the symmetries that frequently occur in this setting. The PIs are studying a range of reconstruction problems and applications, from single-view reconstruction to multi-view stereo, that exploit priors and prior selection. Finally, the PIs are evaluating the potential of these techniques to reconstruct detailed architectural models from street-level, aerial, and interior views from the Internet. As part of this evaluation, they are obtaining ground truth laser scans and registered imagery to provide a benchmark for the research community. The outcome of this research, i.e., tools that can automatically reconstruct geometric models from large collections of images, will enable a host of important applications, ranging across 3D visualization, localization, communication, recognition, and cultural heritage, that go well beyond traditional computer vision problems and can have broad impacts for the population at large. http://grail.cs.washington.edu/projects/cpc/

View original record on NSF Award Search →