Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?

arXiv Econ

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Multimodal LLMs offer the potential for a watershed change for the digitization of historical tables by enabling low-cost processing that is centered on domain expertise rather than technical skill.

We develop and rigorously assess an LLM-based pipeline on a new panel of historical county-level vehicle registration tables from early 20th-century U.S. state reports.

Using human-transcribed gold standard data for evaluation, the pipeline achieves an exact cell match rate of 95.4% at approximately 50 times less expense than traditional outsourcing.

The pipeline performs well at extracting table structure, where it reduces critical parsing errors from 61.4% to 0.35%; in numerical transcription, where it exactly matches 96.7% of linked cells and achieves a mean absolute percentage error of 0.7%.

The pipeline performs on par with human-based category alignment.

We also assess pipeline performance in situ with two case studies that analyze the growth and persistence of historical vehicle adoption using common regression models.

The significance and sign of effects are identical whether using LLM or gold standard data for all eight models tested, and the coefficient of interest is statistically indistinguishable in six of eight models.

전문 보기

Can LLMs Credibly Transform the Creation of Panel Data from Diverse Historical Tables?

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

PACE: A Neuro-Symbolic Framework for Plausible and Actionable Counterfactual Explanations

Auto-FL-Research: Agentic Search for Federated Learning Algorithms

The Wiola Architecture for Efficient Small Language Models

arXiv의 다른 기사

CreativityNeuro: Steering Language Model Weights to Improve Divergent Thinking and Reduce Mode Collapse

Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Beyond Next-Token Prediction: An RLVR Proof of Concept for Tool-Use Agents on Atlassian Workflows