KG2Cypher: Data-Centric Pipeline for Building Enterprise Text-to-Cypher Systems

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Enterprise Knowledge Graphs (KGs) are increasingly used for internal search, analytics, and question answering, but building natural-language interfaces for private enterprise graphs remains costly.

We present KG2Cypher, a data-centric pipeline for building enterprise text-to-Cypher systems from existing KGs.

KG2Cypher first constructs an executable Cypher query from observed graph facts and then uses LLMs to generate its associated natural-language question.

The resulting Text-Cypher pairs are validated with an LLM judge and human validation, and are converted into candidate-aware SFT data.

The trained generator is served with class-conditioned schema prompting, entity retrieval, and LoRA-based inference.

We evaluate KG2Cypher in Korean enterprise settings, where short search-style queries and schema paraphrases make language grounding difficult.

LoRA SFT improves execution-result F1 from 0.806 to 0.950 on broadcast-program queries and from 0.70 to 0.92 on company queries.

In an 11-class setting, KG2Cypher achieves 95.2% exact match, 99.9% execution rate, and 0.964 execution-result F1.

전문 보기

KG2Cypher: Data-Centric Pipeline for Building Enterprise Text-to-Cypher Systems

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

AI-Model Network: Concept, Current State and Future

When Does Personality Composition Matter for Multi-Agent LLM Teams?

Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning

arXiv의 다른 기사

MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thinking Synergy

ToE: A Hierarchical and Explainable Claim Verification Framework with Dynamic Multi-source Evidence Retrieval and Aggregation

Towards Reliable and Robust LLM Planning: Symbolic Feedback-Driven Iterative Self-Refinement Framework