Logistics Distribution Route Optimization With Time Windows Based on Multi-Agent Deep Reinforcement Learning

Logistics Distribution Route Optimization With Time Windows Based on Multi-Agent Deep Reinforcement Learning

Fahong Yu, Meijia Chen, Xiaoyun Xia, Dongping Zhu, Qiang Peng, Kuibiao Deng
DOI: 10.4018/IJITSA.342084
Article PDF Download
Open access articles are freely available for download

Abstract

Multi-depot vehicle routing problem with time windows (MDVRPTW) is a valuable practical issue in urban logistics. However, heuristic methods may fail to generate high-quality solutions for massive problems instantly. Thus, this article presents a novel reinforcement learning algorithm integrated with a multi-head attention mechanism and a local search strategy to solve the problem efficiently. The routing optimization was regarded as a vehicle tour generation process and an encoder-decoder was used to generate routes for vehicles departing from different depots iteratively. A multi-head attention strategy was employed for mining complex spatiotemporal correlations within time windows in the encoder. Then, a decoder with multi-agent was designed to generate solutions by optimizing reward and observing transition state. Meanwhile, a local search strategy was employed to improve the quality of solutions. The experiments results demonstrate that the proposed method can significantly outperform traditional methods in effectiveness and robustness.
Article Preview
Top

Introduction

With the rapid development of the transportation industry, more stringent requirements on vehicle routing have become an emerging issue in transportation service. This challenges vehicle management intelligence. Vehicle Routing Problems (VRPs) have been a subject of extensive research and attention by scholars worldwide since its inception. To generalize the problem to a wider range of use cases, VRPs have been extended to include more complicated scenarios for a real-life environment. The fixed number of vehicles in the fleet and tighter time windows for customer demand have transformed traditional VRPs into vehicle routing problem with time windows (VRPTW). The problem can be characterized as the multi-depot vehicle routing problem with time windows (MDVRPTW). As a much simpler case, the multi-depot vehicle routing problem (MDVRP) itself is NP-hard, implying that it is unrealistic to generate optimal solution for a large-scale problem unless P = NP (Braekers & Nieuwenhuyse, 2020).

For such extensively complicated problems, heuristic methods are conventionally considered as the viable solution tools. However, the logistics industry has been faced with a new challenge for serving massive amounts of requests instantly in the past few decades. Although many researchers propose diverse heuristics to solve the VRPs, it is still a challenging problem to provide reliable solutions for city scale problems within an acceptable amount of time. Artificial intelligence methods have been gradually evolving to tackle vehicle routing problems. Deep reinforcement learning (DRL) has become increasingly prominent in solving complex sequence decision problems, such as dynamic routing choice, automated vehicle control, and emergency evacuation. DRL is the fusion between reinforcement learning (RL) and deep learning (DL) which can addresses the issue of extreme large spaces with the action and state effectively. Subsequently, several studies attempted to solve the VRPs using DRL, with the encoder-decoder architecture being a popular choice for neural network design. Among these studies, an improved pointer network by simplifying the recurrent neural network (RNN) based encoder was proposed which resulted in more efficient solutions to the VRPs (James J., et al. 2019). With respect to graphic representations, an improved DRL incorporated with graph embedding network was given to solve the VRPs problem (Luis et al., 2019). In this model, VRPs were regarded as the route decoder process, and chose action in terms of the output of the graph embedding network. Moreover, a multi-agent attention model to solve the multiple vehicle routing problem with time windows was proposed (Zhang et al., 2020), which could achieve superior performance to several classical heuristic baselines with negligible calculating time.

Even though the proposed approaches have demonstrated superior performance to conventional methods in solving VRPs, most studies tend to focus on addressing straightforward routing issues that are essentially linear programming problems. Nevertheless, the MDVRPTW with various constraints is more challenging to solve. These constraints add complexity to the problem, making MDVRPTW more challenging to solve than traditional VRPs.

  • (1)

    The quality of heuristic methods is often determined by the quality of the groupings, and devising grouping rules requires a substantial amount of expert domain knowledge, making it difficult to achieve optimal results.

  • (2)

    Current research on DRL methods in combinatorial optimization problems mainly focuses on using a single agent to interact with the environment to solve problems like TSP and VRPs, while research on solving MDVRPTW is relatively lacking.

  • (3)

    Compared with single depot vehicle routing problems, the search efficiency of reinforcement learning will be compromised greatly for larger solution space of MDVRP. Furthermore, the decoder framework fixes the order of vehicles in transformer structure, which leads to the restrictions during the exploration of agents and is no longer effective to handle vehicles originated from different depots.

Complete Article List

Search this Journal:
Reset
Volume 17: 1 Issue (2024)
Volume 16: 3 Issues (2023)
Volume 15: 3 Issues (2022)
Volume 14: 2 Issues (2021)
Volume 13: 2 Issues (2020)
Volume 12: 2 Issues (2019)
Volume 11: 2 Issues (2018)
Volume 10: 2 Issues (2017)
Volume 9: 2 Issues (2016)
Volume 8: 2 Issues (2015)
Volume 7: 2 Issues (2014)
Volume 6: 2 Issues (2013)
Volume 5: 2 Issues (2012)
Volume 4: 2 Issues (2011)
Volume 3: 2 Issues (2010)
Volume 2: 2 Issues (2009)
Volume 1: 2 Issues (2008)
View Complete Journal Contents Listing