CAREER:Towards Causal Multi-Modal Understanding with Event Partonomy and Active Perception

$496,524FY2023CSENSF

Auburn University, Auburn AL

Investigators

Abstract

Events are central to causal, visual understanding in complex, dynamic environments. From collaborative robots that assist humans with complex tasks to surveillance systems that detect anomalous behavior, there is a need to understand events, their composition, and their interaction for effective machine perception. This project will explore how events are structured in multimodal data and how they can be leveraged to help design better, embodied agents that can construct and leverage compositional event representations to help function in complex, real-world environments. The developed algorithms could have a broad impact in numerous fields including Artificial Intelligence (AI) and education, such as the future of workforce training. In addition to scientific impact, the project performs complementary educational and outreach activities. Specifically, it engages the broader scientific community in the use of AI and computer vision (CV) research to augment the future of workforce training through workshops and seminars, introduces and enhances the AI and CV education at Oklahoma State University, and develops and fosters an entrepreneurial mindset in computer science education and research through integrated educational activities. The research focuses on the ideas of energy-based neuro-symbolic learning, using Grenander’s Pattern Theory formalism, abductive reasoning, and active embodied vision for learning and using temporal causality for richer, multimodal event understanding. The specific research aims of the project are three-fold. First, it seeks to learn the partonomy of common, everyday events by expressing the hierarchical structure in the form of Bayesian Rose Trees, whose semantics are populated by an energy-based pattern theory inference engine. Second, it will research ways to leverage this event partonomy into understanding actions in videos beyond recognition and perceive the current action in the context of the overall task being performed. This inference mechanism will enable an embodied, intelligent agent to recognize the current action and infer higher-level concepts such as human intent and goals in a unified energy-based framework. Third, it will realize the partonomy-based understanding framework in an embodied agent while augmenting it with active multimodal feedback. It will allow the embodied agent to perform active reasoning through feedback from the environment by controlling its geometric parameters (such as position, orientation, and pose) to navigate clutter and resolve any ambiguity in the perceived event structure. This project is jointly funded by Robust Intelligence (RI) Program and the Established Program to Stimulate Competitive Research (EPSCoR). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →