EAGER: Construction of Social Interactions in 3D Space from First-Person Videos

$200,000FY2016CSENSF

University Of Pennsylvania, Philadelphia PA

Investigators

Abstract

Precision modeling tools for realistic and complex human social interaction are not available today. First-person videos provide a unique opportunity to capture social interaction at unprecedented precision. In contrast, current third person surveillance video only records the few distance views of the interaction passively at a much reduced spatial resolution. This exploratory research project proposes to harness multiple first-person cameras as one collective instrument to capture, model, and predict social behaviors. The proposed research transforms the way we construct realistic social interaction models, while also advancing first-person video recognition. If successful, the envisioned computational model can act as a coach who learns what constitutes successful interactions and failures, thus being able to find solutions to mediate and prevent potential conflicts. The proposed research will model dynamic social interactions in 3D space from multiple personal perspectives. Recognition and prediction of complex social group interactions are challenging because people in the group can carry out unexpected actions intentionally or by mistake. In addition, due to variances in individuals' preferences and abilities, the same activities could be carried out in different ways. First-person videos can be highly jittery, resulting in fast and unpredictable object motions in the field of view. Building on PI's recent work establishing computational foundations for modeling social (people-people) and personal (people-scene) interactions using first-person cameras, this research will explore the novel concept the duality between social attention and roles: social attention provides a cue for recognizing social roles, and social roles facilitate the predictions of dynamic social formation change and its associated social attention. The formal foundation of the 3D model is based on constructing a visual memory that stores first-person social experiences in three forms: (a) geometric social formation, (b) visual image of first-person view, and (c) first-person seen by nearby third person views. As a proof-of-concept, the 3D space model capturing social interactions will be tested on collaborative social tasks such as assembling (Ikea) furniture, or building a block house with a group of friends. This research will construct a labeled dataset capturing the interactions, and perform analysis on both accuracy in recognizing social roles and precision in predicting spatial movements of the members in that social interaction. The results of this project, including papers and dataset, will be disseminated to the public through our project website (http://www.cis.upenn.edu/~jshi/NSF_SocialMemory/nsf_social_visual_memory.html). The software created under this project will be made available to the public through GitHub, a web-based Git repository hosting service

View original record on NSF Award Search →