NCS-FO: Neuroimaging to Advance Computer Vision, NLP, and AI

$1,000,000FY2017CSENSF

Purdue University, West Lafayette IN

Investigators

Jeffrey M Siskindcontact Evguenia A Malaia Ronnie B Wilbur

Abstract

It is often said that a picture is worth a thousand words. Frequently, to search for what is needed, whether images or objects in those images, words are needed instead. Getting accurate labels for efficient searches is a longstanding goal of computer vision, but progress has been slow. This project employs new methods to significantly change how picture-word labeling is accomplished by taking advantage of the best picture recognizer available, the human brain. Through functional magnetic resonance imaging and electroencephalography, brain activity of humans looking at pictures/videos is recorded and then used to improve performance on artificial intelligence (AI) tasks involving computer vision and natural language processing. Current systems use machine learning to train computers to recognize objects (nouns) and activities (verbs) in images/video, which are then used to describe events. Reasoning tasks (e.g., solving math problems) can then be done. These systems are trained on specially prepared datasets with samples of nouns for objects, verbs for activities, sentences describing events, and exam questions and answers. A novel paradigm using humans to perform the same tasks while their brains are scanned allows determination of neural patterns associated with those tasks. The brain activity patterns, in turn, are used to train better computer systems. The central hypothesis is that understanding human processing of grounded language involving predication and its use during reasoning will materially improve engineered computer vision, natural language processing, and AI systems that perform image/video captioning, visual question answering, and problem solving. Scientific and engineering goals include developing models of human language grounding and reasoning consistent with neuroimaging, to improve engineered systems integrating language and vision that support automated reasoning. The main scientific question is to understand mechanisms by which predicates and arguments are identified, linked, and used for reasoning by the human brain. The hypothesis, that predicate-argument linking in visual and linguistic representations are accomplished similarly, and that this then supports reasoning and problem solving, will be tested using multiple neuroimaging modalities, and machine learning algorithms to decode "who did what to whom" from brain scans of subjects processing linguistic and visual stimuli. The iterative approach will involve understanding information integration at the neural level, to improve machine learning performance on AI tasks by training computers to perform increasingly complex tasks with neuroimaging data from stimuli derived from large-scale natural tasks. Using identical datasets for human and machine performance will support translation of scientific advances to engineering practice involving integration of computer vision and natural language processing. This award is cofunded by the Office of International Science and Engineering.

View original record on NSF Award Search →