RI: Small: Semantic 3D Neural Rendering Field Models that are Accurate, Complete, Flexible, and Scalable

$600,000FY2023CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Abstract

This project will investigate methods to create, from multiple images, a scene model that enables visualization, synthesis, counting, measurement, and other analysis. The goals of the project are driven by the need for unified geometric (where, what shape, how big) and semantic (what is it, what is it like) scene models, based on the investigators' direct experience in building products for construction management and vehicle safety. So far, computer vision has arguably had its largest impact in internet domains. This project is needed for broader applications involving the physical world, and the potential impact is hard to overstate. Resulting capabilities will lay foundations for real-time modeling, augmented reality, simulation, and robotics applications. The project lays the groundwork for a queryable, editable, and actionable semantic and geometric scene model, a foundational problem in computer vision. Neural rendering fields, vision language models, and diffusion have been impressively demonstrated for separate image synthesis and analysis applications. The project brings these advances together to enable new representations and capabilities for 3D semantic scene modeling. The result is a scalable and robust approach to create, update, query, and edit models of the world inferred from multiple observations. In particular, the project involves three plans of action. The first is to create measurable and meshable 3D scene models that can be efficiently estimated from sparse views and scale to thousands of images. This includes several developments: new efficiently optimizable, compact representations; incorporation of monocular geometry estimates; joint refinement of pose, gain, and other parameters; methods to scale seamlessly to massive scenes and photo sets; and ways to extract high resolution meshes, floor maps, and other common deliverables. The second plan of action is to incorporate semantic information and decoders for counting, measuring, and change detection. This includes encoding semantics in continuous embeddings and creating decoders for visualizing, counting, measuring, and other scene-wide geometric-semantic queries, to enable real-time, flexible mapping and facility assessment. The third plan of action is to extrapolate beyond direct observations and infer and update models as new observations arrive by integrating generative and predictive processes. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →