Constrained Policy Optimization with Cantelli-Bounded Value-at-Risk

arXiv Stat

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

We introduce Canary, a risk-averse method designed to optimize Value-at-Risk (VaR) constrained reinforcement learning (RL) problems.

We employ Cantelli's inequality to obtain a tractable, conservative and smooth bound on the VaR constraint based on the first two moments of the cost return.

This yields a constraint estimator that remains stable with tight violation thresholds in dense cost regimes.

Extending the trust-region framework of the Constrained Policy Optimization (CPO) method, we further provide worst-case bounds for both policy improvement and constraint violation during the training process.

Empirically, across continuous-control safety benchmarks, Canary most reliably satisfies its constraint, with the fewest violations and the earliest permanent satisfaction, while remaining reward-competitive with other baselines that also satisfy.

전문 보기

Constrained Policy Optimization with Cantelli-Bounded Value-at-Risk

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Recursive Self-Evolving Agents via Held-Out Selection

Data and Evaluation Closed-Loop for Model Capability Enhancement

GPTNT: Benchmarking Real-Time Collaboration Between Multimodal Agents on Keep Talking And Nobody Explodes

arXiv의 다른 기사

Aristotelian Virtue Profiling of LLMs through Ethical Dilemmas

An AI agent for treatment reasoning over a biomedical tool universe

COMPASS: Grounding Composition-Intent Guidance in Unified Multimodal Models