Collaborative Research: SHF: Medium: Semantics-Aware Neural Models of Code

$400,000FY2022CSENSF

William Marsh Rice University, Houston TX

Investigators

Abstract

Large neural models trained on massive amounts of code still often produce code of poor quality, with such elementary errors as uninitialized variables, type-incorrect expressions, and loops that never finish. Such errors can be a source of insidious software vulnerabilities. The root cause of these issues is that the models treat programs as syntactic rather than semantic artifacts, both during training and generation. The project's novelty is to couple such models with symbolic, semantics-aware methods for program synthesis developed in the formal-methods community. The project develops a neurosymbolic program-synthesis framework that closely couples deep learning and classical symbolic methods for program synthesis. The research explores new learning algorithms in which neural models of code (specifically, transformers) are exposed to explicit knowledge about program semantics, mechanisms that use transformers to direct specification-directed synthesizers, and combinations of classical synthesis and learned models to construct novel compositions of neurally generated programs. The project's impact are a unified framework for semantics-aware program synthesis, yielding better tools for automatically creating programs. The project develops a cross-institution Research Experiences for Undergraduates (REU) program, with a special focus on recruiting women, Hispanic, and Black students to participate. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →