Reinforcement Learning for Route Optimization with Robustness Guarantees

Tobias Jacobs; Francesco Alesiani; Gulcin Ermis

doi:10.24963/ijcai.2021/357

Reinforcement Learning for Route Optimization with Robustness Guarantees

Tobias Jacobs, Francesco Alesiani, Gulcin Ermis

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

Main Track. Pages 2592-2598. https://doi.org/10.24963/ijcai.2021/357

PDF BibTeX

Application of deep learning to NP-hard combinatorial optimization problems is an emerging research trend, and a number of interesting approaches have been published over the last few years. In this work we address robust optimization, which is a more complex variant where a max-min problem is to be solved. We obtain robust solutions by solving the inner minimization problem exactly and apply Reinforcement Learning to learn a heuristic for the outer problem. The minimization term in the inner objective represents an obstacle to existing RL-based approaches, as its value depends on the full solution in a non-linear manner and cannot be evaluated for partial solutions constructed by the agent over the course of each episode. We overcome this obstacle by defining the reward in terms of the one-step advantage over a baseline policy whose role can be played by any fast heuristic for the given problem. The agent is trained to maximize the total advantage, which, as we show, is equivalent to the original objective. We validate our approach by solving min-max versions of standard benchmarks for the Capacitated Vehicle Routing and the Traveling Salesperson Problem, where our agents obtain near-optimal solutions and improve upon the baselines.

Keywords:

Machine Learning: Deep Reinforcement Learning

Planning and Scheduling: Planning under Uncertainty

Machine Learning Applications: Applications of Reinforcement Learning