CRCNS US-Japan Research Proposal: Modeling the Dynamic Topological Representation of the Primate Visual System

$680,000FY2022CSENSF

University Of California-San Diego, La Jolla CA

Investigators

Garrison W Cottrellcontact Virginia De Sa

Abstract

The goal of this project is to understand how we see by building computer models that "see the way we do." It is obvious that we learn to talk; it is less obvious that we learn to see. Babies have roughly 20/400 vision, which means they are legally blind, and the world initially looks very blurry to them. They must learn to distinguish people (especially their mother and family) as well as toys, food, and other objects over months and years of development. How is it that we come to be able to see so well that we can play ball, read a book, and thread a needle? One way to understand how this happens is to build computational models that mimic the way the brain works. Artificial Intelligence has blossomed in recent years with the advent of deep neural networks, which are a very simplified model of the brain. They are capable of recognizing faces and objects, and are enabling the creation of self-driving cars. However, there are fundamental differences between these computer vision models and our own visual system that make them less robust. This project will add more features of the human visual system to these models. For example, we have a foveated retina, which enables high fidelity vision only within a small spot of the visual field, about the size of your thumbnail at arm's length. As a result, we move our eyes about 3 times a second in order to bring the world into focus. This project will build a computational model that has a foveated retina, "moves its eyes," and takes data from brain recordings into account. Recent models of the visual system have been benchmarked against cortical recordings (CORnet, BrainScore), but appear to be reaching a plateau. To move beyond this, the next generation of models will have to come closer to the brain in both anatomy and physiology. This project will incorporate radical changes to convolutional networks as well as novel data from the primate visual system. Missing from most models of the visual system are: 1) biologically realistic lateral and feedback connections, including distinct pools of excitatory (E) and inhibitory (I) neurons with the full set of lateral interactions (E->E, E->I, I->E, I->I), and purely excitatory feedback connections; 2) the log-polar mapping from retina to V1, separating central from peripheral representations and adding rotation and scale invariance; and 3) saccades, adding dynamics to the representations. Missing from most neurophysiological recordings are 1) recordings from IT during free viewing of objects (saccading); 2) pharmacological suppression of central and peripheral V1 while recording from IT in order to measure their contributions to representations; and 3) simultaneous recording from multiple areas of IT providing crucial data on their interactions. This project will incorporate all of these advances in order to build biologically realistic vision systems. A companion project is being funded by the National Institute of Information and Communications Technology, Japan (NICT). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →