Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning

Adaptive Reward Shifting Based on Behavior Proximity for Offline Reinforcement Learning

Zhe Zhang, Xiaoyang Tan

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 4620-4628. https://doi.org/10.24963/ijcai.2023/514

One of the major challenges of the current offline reinforcement learning research is to deal with the distribution shift problem due to the change in state-action visitations for the new policy. To address this issue, we present a novel reward shifting-based method. Specifically, to regularize the behavior of the new policy at each state, we modify the reward to be received by the new policy by shifting it adaptively according to its proximity to the behavior policy, and apply the reward shifting along opposite directions for in-distribution actions and the ones not. In this way we are able to guide the learning procedure of the new policy itself by influencing the consequence of its actions explicitly, helping it to achieve a better balance between behavior constraints and policy improvement. Empirical results on the popular D4RL benchmarks show that the proposed method obtains competitive performance compared to the state-of-art baselines.
Keywords:
Machine Learning: ML: Reinforcement learning