Article Preview
TopArtificial neural network (ANN) is one of the most important data mining techniques. It has been successfully applied to many fields such as telecommunication, finance, agriculture, medicine and is used in both unsupervised and supervised learning. The feedforward multilayer perceptron (MLP) is one of the best-known neural networks. The multilayer perceptron (MLP) consists of three layers composed of neurons organized into input, output and hidden layers. The first layer receives the input, the second layer is the hidden layer and the third layer produces the output. The success of an MLP generally depends on the training process that is determined by training algorithms. The objective of the training algorithms is to find the best connection between weights and biases that minimize the classification error.
Training algorithms can be classified into two classes: gradient-based and stochastic search methods. Backpropagation (BP) and its variants are gradient-based methods and considered as one of the most popular techniques used to train the MLP neural network. Gradient-based methods have many drawbacks, such as the slaw convergence, the high dependency on the initial value of weights and biases and the tendency to be trapped in local minima(Zhang, Zhang, Lok, & Lyu,2007).To address these problems, stochastic search methods, such as metaheuristics have been proposed as alternative methods for training feedforward neural network. Metaheuristics have many advantages: they apply to any type of NN with any activation function (Kiranyaz, Ince, Yildirim, & Gabbouj,2009), provide acceptable solutions within a reasonable time to solve complex and difficult problems (Raidl, 2006), and are particularly useful for dealing with large complex problems that generate many local optima (Kenter et al.,2018; Wang, Li, Huang, & Lazebnik,2019) Metaheuristic can be divided into population-based algorithms and single solution-based algorithms (Talbi, 2002;Blum, Puchinger, Raidl& Roli,2011).For single solution-based algorithms, some of the previous research studies have used simulated annealing and tabu search (Shaw, & Kinsner,1996;Treadgold, & Gedeon,1998; Hamm, Brorsen, & Hagan,2009;Sexton, Alidaee, Dorsey, & Johnson,1998; Martıi, & El-Fallahi, 2004) for training feedforward neural network. T.B. Ludermir et al. (Ludermir, Yamazaki, & Zanchettin, 2006) proposed one of the first studies on the optimization of MLP weight and architectures based on simulated annealing and tabu search. The population-based algorithms can be divided into two groups: evolutionary algorithms and swarm intelligence algorithms. For evolutionary algorithms, X. Yao et al. (Yao, 1993), P.J. Angeline et al. (Angeline, Saunders, & Pollack,1994) and P.P. Palms (Palmes, Hayasaka, & Usui,2005) employed genetic algorithms (GA). The authors showed that GA outperforms BP in real word classification. As for swarm intelligence algorithms, several authors proposed swarm-based algorithms as training methods, such as particle swarm optimization (PSO) (Zhang et al., 2000), artificial bee colony (ABC) (Nandy,2012)and BAT (BA) (Nawi, Rehman, & Khan,2014).
Others metaheuristics algorithms have been applied in training feedforward MLP, such as the modified BAT (Jaddi, Abdullah, & Hamdan,2015), Multi-Verse Optimization MVO (Faris, Aljarah, &Mirjalili, 2016), Whale Optimization Algorithm (WOA) (Aljarah, Faris, &Mirjalili, 2018), Grey Wolf Optimizer (GWO) (Hassanin, Shoeb, &Hassanien, 2016; Faris, Mirjalili, & Aljarah,2019), Biogeography-based optimization (BBO) (Aljarah,Faris, Mirjalili, & Al-Madi,2018), Moth-Flame Optimization (MFO) (Faris, Aljarah, & Mirjalili,2017) and Improved Monarch Butterfly Optimization (IMBO) (Faris, Aljarah, & Mirjalili,2017).