RI: Small: Enabling robust visual intelligence using propagators to model human competence
Massachusetts Institute Of Technology, Cambridge MA
Investigators
Abstract
The investigators approach the question of robust intelligence by asking what it is that makes humans both intelligent and robustly intelligent. Part of the answer is that humans are uniquely able to see, to report on what they see, and to use visual events--- both real and imagined---to answer questions on demand and to develop a common sense understanding of the physical world. If robust human-level intelligence is to be understood and engineered, then it is necessary to understand human visual competence. To understand human visual competence, it is necessary to understand how the architecture of the brain enables fragmentary and ambiguous perceptions to be brought into alignment with expectations so as to produce an understanding of the visual world. To take understanding of human vision to the next level, the investigators model the human visual system using the propagator paradigm, a label for a collection of ideas suited to the computational problems faced by vision systems. Propagators themselves are stateless, which makes them appropriate for operation on retinotopic arrays. Also, propagators connect cells in which information monotonically increases, assuring convergence. Most importantly, bi-directonal information flow lies at the core of the propagator paradigm, so when augmented with new capabilities tailored specifically to vision processing, visual information flows not only from the bottom up but also from the top down and, in general, from any module to any other module, just as information flows to and from the many brain centers devoted to vision in the human brain. The investigators note, for example, that the lateral geniculate, a relay station for information flowing from the retina to primary visual cortex, receives most of its input from the primary visual cortex itself. The investigators are motivated not only by a desire to understand human vision, but also by a desire to build vision applications now far beyond the state of the art. To drive their work, they concentrate on a dynamic scene understanding problem: given an urban scene and one or more stationary cameras, recognize actions such as walk, run, stop, put down, pick up, drop, give, take, follow, enter, and leave.
View original record on NSF Award Search →