From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Probe-based uncertainty estimation (UE) has emerged as a prominent approach to detect hallucinations in Large Language Models (LLMs) by learning uncertainty from internal model signals.

Yet, recent methods vary simultaneously across feature design, training data construction, and evaluation setting, obscuring what actually drives performance.

To address this issue, we propose a factorised study of probe-based UE under matched conditions.

Our results show that raw hidden states and attention features are difficult to outperform in-domain.

However, under distribution shift, structured and compressed features are more robust, suggesting that in-domain performance alone is insufficient to measure progress.

Furthermore, prompting and label construction significantly affect probe behaviour.

Building on these best-practice findings, we train benchmark-based pretrained probes that transfer reasonably well to open-ended factual generation, providing a stable off-the-shelf baseline.

Our work encourages more deployment-oriented evaluation of probe-based uncertainty estimators.

The code repository is available at this https URL.

전문 보기

From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

AI-Model Network: Concept, Current State and Future

When Does Personality Composition Matter for Multi-Agent LLM Teams?

Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning

arXiv의 다른 기사

MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thinking Synergy

ToE: A Hierarchical and Explainable Claim Verification Framework with Dynamic Multi-source Evidence Retrieval and Aggregation

Towards Reliable and Robust LLM Planning: Symbolic Feedback-Driven Iterative Self-Refinement Framework