CCRI: Planning: A Community-Standard, Large-Scale Synthetic 3D Scene Dataset for Scene Analysis and Synthesis

$50,000FY2020CSENSF

Brown University, Providence RI

Investigators

Abstract

To function as useful household assistants, robots need to understand what they are seeing and how to navigate in indoor environments. The current state-of-the-art approaches for solving these problems rely on machine learning, and in particular deep learning, which requires large quantities of labeled data (e.g. many images with per-pixel labels indicating what type of object is present at that pixel). Rather than asking people to laboriously label data captured from real-world spaces, a promising alternative approach is to use *synthetic* 3D scenes: virtual 3D models of indoor spaces. The 3D objects which populate these virtual spaces can be equipped with information such as their object type, which allows large sets of labeled training data to be created essentially “for free.” This project aims to construct *the* community-standard, large-scale synthetic 3D scene dataset. While some synthetic 3D scene datasets exist, they are either too small, or they have been subject to onerous use restrictions (and even lawsuits) due to copyright issues on their 3D models, which typically come from for-profit companies. This project will construct a large-scale dataset out of freely-available 3D content. The main contribution of the project is not just this dataset, but also a *scalable pipeline* for creating such 3D scene datasets. This pipeline will be released as open source, allowing others to expand the dataset or to construct their own datasets for needs which may be difficult to anticipate today. In total, the results of this project will enable any researcher (not just those at heavily-resourced institutions) to build AI systems which leverage large-scale synthetic indoor training data. The planned dataset construction pipeline will construct 3D scenes based on 2D floor plan datasets, which already exist at large scale. Using a machine-learning-based system previously developed by the investigators, these 2D floor plans will be converted to 3D models of empty houses. Then, each room in the house will be populated with objects in a plausible arrangement. Initially, this step will be performed by crowd workers on a platform such as Amazon Mechanical Turk. The workers will be instructed to place objects so as to match a photograph, where the photograph is chosen such that its (estimated) room geometry matches the geometry of the empty room to be populated. In a later stage of the project, rooms populated in this manner will be used to train a machine learning model which can automatically place objects based on an input photograph, thus further accelerating the dataset construction process. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →