Scaling Up Program Synthesis and Verification with Large Language Models

Large Language Models like GPT-4 have driven breakthroughs in program synthesis and power widely used coding assistants such as GitHub Copilot. Yet, their lack of reliability has limited their practical impact and prevented them from solving large-scale software development challenges that require chaining many steps of editing or reasoning without error.

This project will investigate the combination of Large Language Models (LLMs) with proof assistants such as Why3, KeY, and Dafny, which are capable of rigorously reasoning about programs and establishing their correctness. Using the Delphyne framework – which introduces a new foundational paradigm for building modular and reliable LLM-enabled software – you will develop a pipeline for generating provably correct programs.

For more information, please reach out to jonathan.laurent@kit.edu.

[Reading | Tool]