CAREER: Interaction-oriented 3D Representation Learning on Point Cloud

$600,000FY2023CSENSF

University Of California-San Diego, La Jolla CA

Investigators

Abstract

The central objective in the field of three dimensional (3D) vision is to leverage perception to plan and execute actions effectively. Consider the process of creating action plans based on visual observations. One might wish to understand the possible alterations to an object after it undergoes a specific action. Interestingly, humans often acquire this knowledge through experiential learning, implying a strong interplay between perception, cognition, and interaction. Inspired by this intertwined relationship, this project endeavors to examine deep learning methodologies for 3D vision within the context of this perception-cognition-interaction cycle. The project will capitalize on the recent advancements in Computer Vision, Machine Learning, and Computer Graphics across three pivotal areas: 3D point cloud data learning, closed-loop policy learning frameworks, and realistic simulation environments. The principal methodology will entail a careful analysis of the relationship between 3D understanding and interaction, and design innovative learning frameworks. These will incorporate representation learning from 3D point clouds for the prediction of actions and by the consequences of actions. The project will enhance the understanding of the three-dimensional world within physically embodied artificial intelligence systems (embodied AI). The ultimate goal is to construct AI systems that can optimally learn from interactive experiences. This research is beneficial for many applications such as smart manufacturing, exploratory robotics, autonomous driving, and augmented reality devices for life and work assistance. This project breaks down the understanding of 3D point cloud data into three distinct categories: comprehension of object structure, grasp of kinematics and dynamics, and perception of interaction. For each dimension, the team will develop novel frameworks that encapsulate the perception-cognition-interaction cycle. Moreover, the research will probe into learning algorithms and 3D neural network architectures that can seamlessly integrate into the cycle. Given the perspective of this research, the project is also expected to unearth a series of challenges not yet extensively researched in 3D vision literature. The team will strive to uncover innovative, principle-based solutions to these challenges. As the project treads new ground and existing datasets are insufficient, the team will also venture into interaction data collection in both virtual and real-world settings. The research philosophy builds on the investigator's prior work in 3D deep learning, generalizable policy learning, and robot simulator design. This foundational work paves the way for 3D representations that can inform the design and learning of interaction policies adaptable across diverse environments and tasks. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →