AxDafny: Agentic Verified Code Generation in Dafny

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

We study agentic code generation in Dafny, where a model must generate both executable code and the proof artifacts for verification.

We present AxDafny, a verifier-guided repair framework that iteratively generates implementations, invariants, assertions, and termination arguments.

We also introduce LiveCodeBench-Pro-Dafny (LCB-Pro-Dafny), a benchmark of 250 competition-style programming problems translated into Dafny with formal specifications and a verifier-based evaluation harness.

On LCB-Pro-Dafny, AxDafny substantially improves verification success over baseline GPT-5.5 performance.

On DafnyBench, AxDafny achieves 92.7\% verification success, outperforming the strongest previously reported proof-hint baseline by 6.5 percentage points.

Lastly, we show that verification success and runtime test performance measure different aspects of generated code.

전문 보기

AxDafny: Agentic Verified Code Generation in Dafny

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

What Drives Interactive Improvement from Feedback?

Contrastive Reflection for Iterative Prompt Optimization

How Can AI Find My Model? A Model-Finding Experimental Study Considering Data Formats, Embeddings, and Retrieval Strategies

arXiv의 다른 기사

Beyond expert users: agents should help users construct preferences, not just elicit them

Investigating Multi-Agent Deliberation in Law

Why Solve It Twice? Hierarchical Accumulation of Skills for Transfer-Efficient ML Engineering