State Revisit and Re-explore: Bridging Sim-to-Real Gaps in Offline-and-Online Reinforcement Learning with An Imperfect Simulator
State Revisit and Re-explore: Bridging Sim-to-Real Gaps in Offline-and-Online Reinforcement Learning with An Imperfect Simulator
Xingyu Chen, Jiayi Xie, Zhijian Xu, Ruixun Liu, Shuai Yang, Zeyang Liu, Lipeng Wan, Xuguang Lan
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 8724-8732.
https://doi.org/10.24963/ijcai.2025/970
In reinforcement learning (RL) based robot skill acquisition, a high-fidelity simulator is usually indispensable but unattainable since the real environment dynamics are difficult to model, which leads to severe sim-to-real gaps. Existing methods solve this problem by combining offline and online RL to jointly learn transferable policies from limited offline data and imperfect simulators. However, due to the unrestricted exploration in the imperfect simulator, the hybrid offline-and-online RL methods inevitably suffer from low sample efficiency and insufficient state-action space coverage during training. To solve this problem, we propose a State Revisit and Re-exploration (SR2) hybrid offline-and-online RL framework. In particular, the proposed algorithm employs a meta-policy and a sub-policy, where the meta-policy aims to find high-quality states in the offline trajectories for online exploration, and the sub-policy learns the robot skill using mixed offline and online data. By introducing the state revisit and explore mechanism, our approach efficiently improves performance on a set of sim-to-real robotic tasks. Through extensive simulation and real-world tasks, we demonstrate the superior performance of our approach against other state-of-the-art methods.
Keywords:
Robotics: ROB: Learning in robotics
Machine Learning: ML: Reinforcement learning
