RI: SMALL: Recognizing objects in images and their properties over time

$419,453FY2020CSENSF

University Of Texas At Austin, Austin TX

Investigators

Abstract

Computer vision lives in the golden age of datasets. All aspects of human vision are systematically mapped and transcribed into an ever-larger pool of labeled data. All with one single goal: teach a vision system, nowadays a deep network, to imitate all aspects of human perception. The current recipe is simple: collect sufficient labeled data, then use supervised machine learning to mimic the supervision. There is just one issue with this approach: Systems trained this way are limited to imitate a single narrow task. In this project, we take a step towards unifying many vision tasks into one single system: A framework that infers all properties of all things through time. If successful, this system can identify objects and all their properties in any new unseen image, and bring the full power of computer vision to the non-expert. Applications include autonomous agents interacting with the world through the manipulation of objects, and assistive technologies for the elderly that observes the world through moving objects and their properties. The project will pursue three research thrusts. 1. Detecting all objects: In object detection, datasets specialize in domains. Driving datasets describe any vehicle type imaginable, indoor datasets focus on common household objects, and pedestrian datasets exclusively focus on humans. How can we train an object detection system that leverages all these sources of data? How can we relate these different data sources to each other? How do we deal with partial annotation in some data sources? 2. Inferring all properties: Object detection forms the basic building block for many aspects of visual reasoning. However, the most interesting tasks start after detection: What is the 2D or 3D pose of an object? Is this object deformable? Could it be a danger to an autonomous vehicle? Again, there are hundreds of tasks and data sources that describe all the properties of objects. How can we learn a detector that infers them all? 3. Recognition through time: Finally, detection should not be isolated in time. How do we reason about objects and properties through time? Can we learn to recognize objects in a temporally coherent manner using current image-based datasets? This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →