Lyapunov-Certified Direct Switching Theory for Q-Learning
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Q-learning is a fundamental algorithmic primitive in reinforcement learning.
This paper develops a new framework for analyzing Q-learning from a switching linear system (SLS) viewpoint.
In particular, we derive a stochastic SLS representation of the Q-learning error, and a finite-time error analysis through the joint spectral radius (JSR) of the corresponding SLS model, where the JSR is the exact worst-case exponential rate of the associated SLS.
To the best of our knowledge, this is the first convergence rate analysis of standard Q-learning whose leading exponential rate is expressed through the JSR.
The resulting rate is tied to the intrinsic worst-case exponential rate of the direct SLS representation and can be sharper than row-sum upper bounds when those bounds are conservative.