The Transformer as a Polar State Estimator
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
We show that the core components of the Transformer -- attention, residual connections, and normalization -- arise naturally from a single geometric state estimation problem.
Modeling the latent state in polar form, with direction constrained to the hypersphere and uncertainty decomposed into radial and tangential components, yields a precision-weighted filtering procedure in which normalization enforces the hyperspherical constraint, attention aggregates directional evidence, and residual connections implement incremental state updates.
Under suitable first-order approximations, this estimator reduces to the standard Transformer block with rotary positional encodings, showing that its architecture follows from the underlying estimation problem rather than from independent design choices.
Retaining higher-order geometric corrections yields the proposed \textit{Polar Transformer}, which more faithfully approximates the underlying radial-tangential state estimator.