Collaborative Research: SHF: Medium: Semantic Aware Code Generation with Large Language Models

$675,000FY2025CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Limin Jiacontact Beidi Chen Corina Pasareanu

Abstract

Large Language Models (LLMs) show great promise for generating source code and automating programming tasks. But these models are error-prone and can produce code with subtle bugs. This poses a risk for deploying LLMs in industrial settings for software engineering tasks - the subtly erroneous code generated by LLMs can expose vulnerabilities that compromise system security. It has been shown that the weakness of LLMs for code generation primarily stems from not accounting for the semantic properties of programs when training, using, and evaluating these models. This project aims to improve LLMs’ ability to generate high-quality code by deeply integrating program analyses with all the stages in the life cycle of LLMs: training, code generation, and evaluation. This project develops novel quantitative program analyses techniques to provide feedback to LLMs during training and decoding. First, the project leverages symbolic execution and Bayesian program analyses to design meaningful metrics to evaluate LLM-generated code. This project then uses program scores to train a differentiable reward model that can assess the quality of partial or complete generated code. At training time, inspired by Reinforcement Learning with Human Feedback (RLHF), this project uses the reward model for fine-tuning LLMs to generate high-quality code. To improve code generation at decoding time, this project leverages the reward model and similarity-based program ranking techniques to constrain and prune the decoding tree. Finally, this project develops semantics-guided metrics and collects new benchmarks consisting of realistic coding tasks for training and evaluating code LLMs. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →