AxDafny: Agentic Verified Code Generation in Dafny
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
We study agentic code generation in Dafny, where a model must generate both executable code and the proof artifacts for verification.
We present AxDafny, a verifier-guided repair framework that iteratively generates implementations, invariants, assertions, and termination arguments.
We also introduce LiveCodeBench-Pro-Dafny (LCB-Pro-Dafny), a benchmark of 250 competition-style programming problems translated into Dafny with formal specifications and a verifier-based evaluation harness.
On LCB-Pro-Dafny, AxDafny substantially improves verification success over baseline GPT-5.5 performance.
On DafnyBench, AxDafny achieves 92.7\% verification success, outperforming the strongest previously reported proof-hint baseline by 6.5 percentage points.
Lastly, we show that verification success and runtime test performance measure different aspects of generated code.