CAREER: Toward a General Framework for Words and Pictures
University Of North Carolina At Chapel Hill, Chapel Hill NC
Investigators
Abstract
Pictures convey a visual description of the world directly to their viewers. Computer vision strives to design algorithms to extract the underlying world state captured in the camera's eye, with an overarching goal of general computational image understanding. To date much vision research has approached image understanding by focusing on object detection, only one perspective on the image understanding problem. This project looks at an additional, complimentary way to collect information about the visual world -- by directly analyzing the enormous amount of visually descriptive text on the web to reveal what information is useful to attach to, and extract from pictures. This project presents a comprehensive research program geared toward modeling and exploiting the complimentary nature of words and pictures. One main goal is studying the connection between text and images to learn about depiction -- communication of meaning through pictures. This goal is addressed through 3 broad challenges: 1) Developing a richer vocabulary to describe the information provided by depiction. 2) Developing image representations that can visually capture this more nuanced vocabulary. 3) Constructing a comprehensive joint words and pictures framework. This project has direct significance to many concrete tasks that access images on the internet including: image search, browsing, and organization, as well as commercial applications such as product search, and societally important applications such as web assistance for the blind. Additionally, outputs of this project, including progress toward a natural vocabulary and structure for visual description, have great potential for cross-cutting impact in both the computer vision and natural language communities.
View original record on NSF Award Search →