CRII: RI: Joint Models of Language and Context for Robotic Language Acquisition

$163,057FY2017CSENSF

University Of Maryland Baltimore County, Baltimore MD

Investigators

Abstract

As robots become smaller, less expensive, and more capable, they are able to perform an increasing variety of tasks, leading to revolutionary improvements in domains such as automobile safety and manufacturing. However, their inflexibility makes them hard to deploy in human-centric environments such as homes and schools, where their tasks and environments are constantly changing. Meanwhile, learning to understand language about the physical world is a growing research area in both robotics and natural language processing. The core problem is how the meanings of words are grounded in the noisy, perceptual world in which a robot operates. This project explores how robots can learn about the world from natural language in order to take instructions and learn about their environment naturally and intuitively from people. The ability to follow directions reduces the adoption barrier for robots in domains such as assistive technology, education, and caretaking, where interactions with non-specialists are crucial. Such robots have the potential to ultimately improve autonomy and independence for populations such as aging-in-place elders; for example, a manipulator arm that can learn from a user?s explanation how to handle food or open novel containers would directly affect the independence of persons with dexterity concerns such as advanced arthritis. This is an exploratory investigation of how linguistic and perceptual models can be expanded during interaction, allowing robots to understand novel language about unanticipated domains. In particular, the focus is on developing new learning approaches that correctly induce joint models of language and perception, building data-driven language models that add new semantic representations over time. The work combines semantic parser learning, which provides a distribution over possible interpretations of language, with perceptual representations of the underlying world. New concepts are added on the fly as new words and new perceptual data are encountered, and a semantically meaningful model can be trained by maximizing the expected likelihood of language and visual components. This integrated approach allows for effective model updates with no explicit labeling of words or percepts. This approach will be combined with experiments on improving learning efficiency by incorporating active learning, leveraging a robot's ability to ask questions about objects in the world.

View original record on NSF Award Search →