CAREER: Discriminative Spatiotemporal Models for Recognizing Humans, Objects, and their Interactions

$106,011FY2015CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

One of the goals of computer vision is to build a system that can see people and recognize their activities. Human actions are rarely performed in isolation -- the surrounding environment, nearby objects, and nearby humans affect the nature of the performed activity. Examples include actions such as "eating" and "shaking hands." The research goal of this project is to approach human performance in understanding videos of activities defined by human-object and human-human interactions. This project makes use of structured, contextual representations to make predictions given spatiotemporal data. It does so by extending recent successful work on object recognition to the space-time domain, introducing extensions for spatiotemporal grouping and contextual modeling. Video enables the extraction of additional dynamic cues absent in static images, but this poses additional computational burdens that are addressed through algorithmic innovations for approximate parsing and large-scale discriminative learning. To place activity recognition on firm quantitative ground, the proposed models are evaluated using concrete metrics based on activities of daily living (ADL) and human proxemic models from the medical and anthropological communities. Examples include systems for automated monitoring of stroke patients interacting with everyday objects and automated analysis of crisis response team interactions during emergency drills. This project produces non-scripted, real-world, labeled action recognition datasets, of benefit to the research community as a whole.

View original record on NSF Award Search →