Comp Cog: Collaborative Research on the Development of Visual Object Recognition

$313,582FY2015SBENSF

Georgia Tech Research Corporation, Atlanta GA

Investigators

James Rehgcontact Fuxin Li Maithilee Kunda

Abstract

Human visual object recognition is fast and robust. People can recognize a large number of visual objects in complex scenes, from varied views, and in less than optimal circumstances. This ability underlies many advanced human skills, including tool use, reading, and navigation. Artificial intelligence devices do not yet approach the level of skill of everyday human object recognition. This project will address one gap in current knowledge, an understanding of the visual experiences that allow skilled object recognition to develop, by capturing and analyzing the visual experiences of 1- to 2-year-old toddlers. This is a key period for understanding human visual object recognition because it is the time when toddlers learn a large number of object categories, when they learn the names for those objects, and when they instrumentally act on and use objects as tools. Two-year-old children, unlike computer vision systems, rapidly learn to recognize many visual objects. This project seeks to understand how the training experiences (everyday object viewing) of toddlers may be optimal for building robust visual object recognition. The project aims to (1) understand the visual and statistical regularities in 1- to 2-year-old children's experiences of common objects (e.g., cups, chairs, trucks, dogs) and (2) determine whether a training regimen like that experienced by human toddlers supports visual object recognition by state-of-the art machine vision. Considerable progress in understanding adult vision has been made by studying the visual statistics of "natural scenes." However, there is concern about possible artifacts in these scenes because they typically photographs taken by adults and thus potentially biased by the already developed mature visual system that holds the camera and frames the pictures. Also, photographed scenes differ systematically from the scenes sampled by people as they move about and act in the world. Accordingly, there is increased interest in egocentric views collected from body-worn cameras, the method used in the present work. Toddlers will wear lightweight head cameras as they go about their daily activities, allowing the investigators to capture the objects the toddlers see and the perspectives and contexts in which they see them. The research will analyze the frequency, views, visual properties, and range of seen objects for the first 100 object names normatively learned by young children, providing a first description of the early learning environment for human visual object recognition. These toddler-perspective scenes will be used as inputs to machine learning models to better understand how the visual information in the scenes supports and constrains the development of visual object recognition. Machine-learning experiments will determine which properties and statistical regularities are most critical for learning to recognize common object categories in multiple scene contexts. Data collected will be shared through Databrary, an open data library for developmental science.

View original record on NSF Award Search →