AETDICE: Unified Framework and Offline Optimization for Nonlinear Multi-Objective RL
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Optimizing nonlinear preferences in multi-objective reinforcement learning (MORL) is essential for capturing complex trade-offs like risk aversion or fairness.
However, such non-linearity has historically bifurcated nonlinear MORL objectives into two distinct paradigms: Scalarized Expected Return (SER) and Expected Scalarized Return (ESR).
While SER requires global-level optimization and ESR requires non-Markovian policies, leading to fragmented optimization strategies, we bridge this divide through the Aggregation-Expectation-Transformation (AET) framework.
By unifying both criteria through a tripartite decomposition of scalarization, AET provides a principled foundation for general nonlinear MORL.
Building on this framework, we propose AETDICE, a tractable offline RL algorithm for AET objectives.
By utilizing DICE-style density-ratio estimation in an augmented state space, AETDICE enables sample-based optimization from static datasets.
Our framework resolves long-standing barriers and captures respective trade-offs induced by AET framework, which existing methods fail to address.