Single and Multi Truth Data Fusion using Large Language Models
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Data fusion, also known as truth discovery, is a data integration problem that aims to determine the correct value or set of values for each attribute of an object when presented with potentially conflicting values from multiple sources.
Data fusion tasks belong to two main categories: single-truth scenarios, where each attribute has only one correct value, and multi-truth scenarios, where multiple values can be valid simultaneously.
This paper investigates the use of Large Language Models (LLMs) in data fusion tasks for tabular data.
Various prompting strategies, encompassing both single-truth and multi-truth scenarios, are investigated empirically.
Domain-dependent, domain-independent, zero-shot and one-shot prompts are evaluated on three different benchmark datasets.
Experimental results demonstrate that LLM-based approaches outperform traditional unsupervised truth discovery methods, such as DART and LTM, across all datasets.
The codebase of this study has been made publicly available on GitHub.