From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Probe-based uncertainty estimation (UE) has emerged as a prominent approach to detect hallucinations in Large Language Models (LLMs) by learning uncertainty from internal model signals.
Yet, recent methods vary simultaneously across feature design, training data construction, and evaluation setting, obscuring what actually drives performance.
To address this issue, we propose a factorised study of probe-based UE under matched conditions.
Our results show that raw hidden states and attention features are difficult to outperform in-domain.
However, under distribution shift, structured and compressed features are more robust, suggesting that in-domain performance alone is insufficient to measure progress.
Furthermore, prompting and label construction significantly affect probe behaviour.
Building on these best-practice findings, we train benchmark-based pretrained probes that transfer reasonably well to open-ended factual generation, providing a stable off-the-shelf baseline.
Our work encourages more deployment-oriented evaluation of probe-based uncertainty estimators.
The code repository is available at this https URL.