Reinforcement Learning from Optimization Proxy for Ride-Hailing Vehicle Relocation (Extended Abstract)

Enpeng Yuan; Wenbo Chen; Pascal Van Hentenryck

doi:10.24963/ijcai.2023/796

Reinforcement Learning from Optimization Proxy for Ride-Hailing Vehicle Relocation (Extended Abstract)

Enpeng Yuan, Wenbo Chen, Pascal Van Hentenryck

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

Journal Track. Pages 6990-6994. https://doi.org/10.24963/ijcai.2023/796

PDF BibTeX

Idle vehicle relocation is crucial for addressing demand-supply imbalance that frequently arises in the ride-hailing system. Current mainstream methodologies - optimization and reinforcement learning - suffer from obvious computational drawbacks. Optimization models need to be solved in real-time and often trade off model fidelity (hence quality of solutions) for computational efficiency. Reinforcement learning is expensive to train and often struggles to achieve coordination among a large fleet. This paper designs a hybrid approach that leverages the strengths of the two while overcoming their drawbacks. Specifically, it trains an optimization proxy, i.e., a machine-learning model that approximates an optimization model, and refines the proxy with reinforcement learning. This Reinforcement Learning from Optimization Proxy (RLOP) approach is efficient to train and deploy, and achieves better results than RL or optimization alone. Numerical experiments on the New York City dataset show that the RLOP approach reduces both the relocation costs and computation time significantly compared to the optimization model, while pure reinforcement learning fails to converge due to computational complexity.

Keywords:

Machine Learning: ML: Applications

Machine Learning: ML: Reinforcement learning

Planning and Scheduling: PS: Real-time planning