Constrained Policy Optimization with Cantelli-Bounded Value-at-Risk
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
We introduce Canary, a risk-averse method designed to optimize Value-at-Risk (VaR) constrained reinforcement learning (RL) problems.
We employ Cantelli's inequality to obtain a tractable, conservative and smooth bound on the VaR constraint based on the first two moments of the cost return.
This yields a constraint estimator that remains stable with tight violation thresholds in dense cost regimes.
Extending the trust-region framework of the Constrained Policy Optimization (CPO) method, we further provide worst-case bounds for both policy improvement and constraint violation during the training process.
Empirically, across continuous-control safety benchmarks, Canary most reliably satisfies its constraint, with the fewest violations and the earliest permanent satisfaction, while remaining reward-competitive with other baselines that also satisfy.