Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Numerical reasoning over expert-domain tables often exhibits high in-domain accuracy but limited robustness to domain shift.

Models trained with supervised fine-tuning (SFT) on specific datasets tend to rely on header-operation shortcuts rather than structural reasoning.

We introduce TaNOS, a continual pre-training framework comprising three components: (i) header anonymization to reduce lexical memorization, (ii) operation sketches that provide minimal structural cues, and (iii) self-supervised pretraining that constructs correctness-guaranteed program-question pairs from given tables in a program-first manner.

By decoupling domain semantics and numerical operation structure, TaNOS improves the transferability of numerical reasoning.

Applied to an 8B instruction-tuned model, TaNOS achieves 80.13% execution accuracy on FinQA with only 10% train data, outperforming SFT baseline (73.97%) with full train data and proprietary models such as GPT-5, Gemini-2.5-Pro.

Furthermore, in the domain-shift experiments, TaNOS displays nearly-negligible cross-domain gap (<2pp) when standard SFT shows over 10pp gap.

These results suggest that structural guidance with operation sketches, header-agnostic representations, and correctness-guaranteed self-supervision can improve the robustness of numerical reasoning across diverse expert-domain tables.

전문 보기

Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

What Drives Interactive Improvement from Feedback?

Contrastive Reflection for Iterative Prompt Optimization

How Can AI Find My Model? A Model-Finding Experimental Study Considering Data Formats, Embeddings, and Retrieval Strategies

arXiv의 다른 기사

Beyond expert users: agents should help users construct preferences, not just elicit them

Investigating Multi-Agent Deliberation in Law

Why Solve It Twice? Hierarchical Accumulation of Skills for Transfer-Efficient ML Engineering