Learning Fundamentals Atomic Image Structures From Natural Images, Video and Shapes

$340,058FY2002CSENSF

University Of California-Los Angeles, Los Angeles CA

Investigators

Abstract

As the objective of vision (human and machine) is to compute a hierarchy of increasingly abstract interpretations of the observed images or image sequences, it is of fundamental importance to know what are the concepts used at each level of interpretation. In more plain language, what are the visual "strokes", visual "characters", and visual "words"? Or what are the visual "electrons", "atoms" and "molecules"? The goal of the proposed research is to discover dictionaries of various levels of visual concepts that correspond to fundamental topologic, photometric, geometric, and dynamic structures of the images and scenes. In a mathematical language, these structures are the low dimensional manifolds embedded in very high dimensional image space. More specifically, we propose to construct top-down generative models for natural images, 3D surfaces, human faces, video sequences, and 2D shape contours. The fundamental atomic structures are defined by parameters in the generative models, and these parameters are estimated by fitting the models to the training data. These structures are intrinsic to the ensemble of natural images and video. We propose stochastic (Markov chain Monte Carlo) learning algorithms which is capable of computing globally optimal solutions.

View original record on NSF Award Search →