BLAgent: Agentic RAG for File-Level Bug Localization
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Bug localization remains a key bottleneck for large language model (LLM)-based software maintenance, where accurately identifying faulty code is essential for debugging, root cause analysis, triage, and automated program repair (APR).
File-level bug localization is especially critical in hierarchical localization and repair pipelines, where incorrect file selection can propagate to downstream stages such as function-level localization and patch generation.
While Retrieval-Augmented Generation (RAG) offers a promising way to ground LLMs in repository context, existing RAG pipelines often rely on static retrieval and lack the reasoning needed to accurately identify faulty code.
In this work, we present BLAgent, a novel agentic RAG framework for file-level bug localization that integrates three key ideas: (i) code structure-aware repository encoding with path-augmented AST-based chunking, (ii) dual-perspective query transformation that captures both structural and behavioral signals from bug reports, and (iii) two-phase agentic reranking that combines symbolic inspection with evidence-grounded reasoning.
Unlike prior graph-based or multi-hop agentic approaches, BLAgent adopts a bounded reasoning strategy that limits LLM-based inspection and reranking to a compact, retrieval-filtered set of candidate files, avoiding open-ended repository traversal.
This design balances localization accuracy with computational cost.
On SWE-bench-Lite, BLAgent attains over 78% Top-1 accuracy with open-source models and over 86% with a closed-source model, while being over 18x cheaper than the strongest baseline using the same model.
When integrated into an APR framework, BLAgent improves end-to-end repair success by up to 25%.