KG2Cypher: Data-Centric Pipeline for Building Enterprise Text-to-Cypher Systems
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Enterprise Knowledge Graphs (KGs) are increasingly used for internal search, analytics, and question answering, but building natural-language interfaces for private enterprise graphs remains costly.
We present KG2Cypher, a data-centric pipeline for building enterprise text-to-Cypher systems from existing KGs.
KG2Cypher first constructs an executable Cypher query from observed graph facts and then uses LLMs to generate its associated natural-language question.
The resulting Text-Cypher pairs are validated with an LLM judge and human validation, and are converted into candidate-aware SFT data.
The trained generator is served with class-conditioned schema prompting, entity retrieval, and LoRA-based inference.
We evaluate KG2Cypher in Korean enterprise settings, where short search-style queries and schema paraphrases make language grounding difficult.
LoRA SFT improves execution-result F1 from 0.806 to 0.950 on broadcast-program queries and from 0.70 to 0.92 on company queries.
In an 11-class setting, KG2Cypher achieves 95.2% exact match, 99.9% execution rate, and 0.964 execution-result F1.