A Hybrid Reinforcement Learning and PSO Approach for Route Discovery of Flying Robots

A Hybrid Reinforcement Learning and PSO Approach for Route Discovery of Flying Robots

Ritu Maity, Ruby Mishra, Prasant Kumar Pattnaik, Nguyen Thi Dieu Linh
DOI: 10.4018/978-1-6684-9317-5.ch002
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

Route discovery for flying robots is one of the major concerns while developing an autonomous aerial vehicle. Once a path planning algorithm is built and the flying robot reaches the destination point from the target point successfully, it is again important for the flying robot to come back to its original position and that is done through route discovery algorithms. Reinforcement learning is one of the popular machine learning methods in which the flying robot has to interact with the environment and learn by exploring the possibilities and maximum reward point method, without the requirement of a large amount of prior training data. Particle swarm optimization is an artificial intelligence inspired algorithm which finds optimal solution in a multi-dimensional space. This chapter has discussed a random exploration reinforcement learning approach combined with PSO algorithm that has been used to discover the optimum path for a flying robot to return from the destination point to the target point after it had traversed its best path from an already defined swarm intelligence technique. PSO+Reinforcement Learning (RL-PSO) is an optimization technique that combines the global search capability of PSO with the exploitation and exploration strategy of RL. Here higher reward points were assigned to the already defined best path obtained from the path planning technique, so that while returning from the destination point it will try to find the route with the highest reward point. With several iterations, it will optimize and find the best route for backpropagation. The algorithm is built using a python environment and the convergence result with the number of iterations has been validated.
Chapter Preview
Top

1. Introduction

The domain of aerial vehicles is gaining importance in the last few years due to the wide range of applications of drones, and aerial vehicles in various sectors starting with the retail sector, agricultural sector, disaster management, industries, military, surveillance work, health care sectors, etc. Due to their wide range of applications, it is important to plan their path effectively for performing any kind of task. Flying robots are systems capable of vertical take-off and landing without any human intervention (Feron, 2008). While building an autonomous flying robot one of the key features is path planning for flying robots. We have been working on path planning of a hybrid type of fixed-wing flying robot which can be used in the healthcare sector for spraying disinfectant in operation theatres. There are numerous path-planning techniques proposed over years in the area of drones or flying robots. It is important to decide the optimum path planning technique as it takes into consideration various factors like obstacle avoidance, optimizing energy consumption, and minimizing the computational and flight time (Yang et al., 2016). Once a path planning technique is decided and the flying robot traverse from the source point to the destination point, after completing the assigned task the flying robot has to return to its initial position. To return flying robot has to use a route discovery technique to come back as the initial much amount of training data is not available to train it for traversing back autonomously. In this situation, we have tried to use a reinforcement learning approach where we have assigned higher reward points to the forward path traversed by the flying robot so that while returning the flying robot will try to randomly explore all possible paths to reach the initial point and with several iterations, it will try to optimize itself by choosing the path having the highest reward points. One of the major limitation of reinforcement learning algorithm is to establish a balance between exploration ie finding new action to gather information and exploitation ie which means using intelligence to maximize rewards and sometimes it gets stuck in local minima. To overcome this shortcoming we have used PSO along with reinforcement learning. PSO is a global optimization method which can find global optimal solution instead of getting confined in a local minima. Here we have tried to capture advantages of both PSO and reinforcement learning technique to come up with an efficient model.

Complete Chapter List

Search this Book:
Reset