HCC: Small: Open-vocabulary Neurosymbolic 3D Models

$477,651FY2025CSENSF

Brown University, Providence RI

Investigators

Abstract

The demand for 3D assets is increasing across many fields, including synthetic data for computer vision and robotics, mixed reality, interior design, furniture retail, real estate, and games with user-created content. It would be valuable to make 3D asset creation accessible to users in these fields without 3D modeling expertise. Recently, there has been an explosion of development in machine learning models which can take as input a text description of any visual concept and output a 3D asset. While such models have great potential, they are limited by their reliance on text as their sole means of user control: text is ill-suited for precisely specifying many attributes of a 3D asset (e.g. lengths or sizes), these models do not perfectly respect attributes specified in text (e.g. counts of or spatial relations between objects), and users must resort to tedious and often-ineffective prompt tweaking to modify the outputs of such models. This research project seeks to address these issues by developing a new class of machine learning model for 3D asset creation. These new models will take text descriptions as input, but instead of generating 3D geometry directly, they will instead generate computer programs which produce 3D objects. The code of each program will be meaningfully-structured, exposing parameters which users can modify to produce desired changes in the output 3D object. These output objects will be decomposed into meaningful parts, where each part has detailed 3D geometry and texture. The researchers will leverage large language models (LLMs) to produce programs with meaningful structure and parameters. Since LLMs are error-prone, they will be used to propose procedural abstractions for modeling the part structure of objects; these abstractions will then be filtered and refined based on their ability to reconstruct 3D assets in a dataset. The researchers will then train neural networks to generate programs which use the final library of abstractions. Next, instead of learning programs from existing 3D asset datasets, the researchers will develop methods for synthesizing on-demand datasets of 3D assets from input text descriptions. This approach builds on recent advances in fast 3D generative models; it also includes a plan for automatically decomposing generated assets into parts. Finally, every structural abstraction in the learned library will be equipped with a module that generates detailed surface geometry and appearance for that structure. These modules will disentangle structure from surface details; the researchers will also explore techniques for enabling interpretable parametric control of these surface details. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →