Improved Model Based on GoogLeNet and Residual Neural Network ResNet

Improved Model Based on GoogLeNet and Residual Neural Network ResNet

Xuehua Huang
DOI: 10.4018/IJCINI.313442
Article PDF Download
Open access articles are freely available for download

Abstract

To improve the accuracy of image classification, a kind of improved model is proposed. The shortcut is added to GoogLeNet inception v1 and several other ways of shortcut are given, and they are GRSN1_2, GRSN1_3, GRSN1_4. Among them, the information of the input layer is directly output to each subsequent layer in the form of shortcut. The new improved model has the advantages of multi-size and small convolution kernel in the same layer in the network and the advantages of shortcut to reduce information loss. Meanwhile, as the number of inception blocks increases, the number of channels is increased to deepen the extraction of information. The GRSN, GRSN1_2, GRSN1_3, GRSN1_4, GoogLeNet, and ResNet models were compared on cifar10, cifar100, and mnist datasets. The experimental results show that the proposed model has 3.07% improved to ResNet on data set cifar10, 2.08% on data set cifar100, 17.69% improved to GoogLeNet on data set cifar10, 28.47% on data set cifar100.
Article Preview
Top

Introduction

Deep learning is a research hot spot at present, and its research results have been widely used in image recognition, speech recognition, and other fields. This paper concerns the study of the application of deep neural network algorithm in image classification. An improved neural network model based on GoogLeNet and residual neural network ResNet is proposed. First, the mainstream algorithms of deep neural network in image recognition is introduced in this paper and the analysis of the advantages and disadvantages of GoogLeNet series algorithm and residual neural network algorithm. Based on the advantages of the two kinds of algorithms, an improved model GRSN is proposed by introducing the shortcut into the GoogLeNet inception block algorithm. Secondly, the detailed network structure and parameter setting of GRSN is shown. In addition, the experimental analysis is carried out. Then, the over-fitting situation is optimized. Finally, the conclusion and prospect are given.

The BP neural network proposed by Rumelhart and McClelland in 1986 is a fully connected network, but this network cannot provide great performance or high accuracy in image recognition. To address this problem, the CNN (Convolution Neural Network) was proposed, which imitates the higher sensitivity to main features of human eyes and adopts a shared weight structure to significantly reduce the weight parameters and improve the performance and accuracy of image recognition. In 1998, LeCun proposed the LeNet, which is a typical CNN structure, and it consists of two convolution layers, two pooling layers, and two fully-connected layers. When this network was applied to the handwritten digit recognition of people in the United States, the error rate was lower than 1%. However, its recognition performances on Chinese characters and more complex images need to be enhanced. In 2012, the AlexNet developed by Hinton and his student Alex Krizhevsky won the International Conference on Computer Vision (ICCV) contest. Their algorithm adds more layers and adopts a parallel structure to run on GPU, so its image recognition performance is significantly improved. VGG uses small convolution kernels to increase the numbers of layers and channels of the network (Simonyan, 2014). The GoogLeNet algorithm won the championship in the same year. In this algorithm, the inception structure block is introduced, and the convolution kernels of different sizes are used in the same layer of the network to extract a feature of different sizes and improve the perception of the model. By introducing the 1*1 convolution kernels, dimension reduction can also be achieved. There are four versions of GoogLeNet: v1, v2, v3 and v4, and the main difference between these versions is in the inception module.

The recognition accuracy of the algorithm can be improved by increasing the number of network layers. LeNet has 5 layers, AlexNet has 8 layers, VGG has 16 layers and 19 layers structure, and inception v1 has 22 layers. However, He Kaiming, the designer of ResNet, conducted an experiment on the cifar10 data set and found that the image recognition error rate of a 56-layer convolution network was higher than that of a 20-layer convolution network (He et al., 2016). Therefore, it will lead to the degradation in the performance of the neural network model by simply increasing the network layers, causing the later features to lose the original information of previous features. To address this problem, he proposed adding a bypass between the two blocks. In addition to the ResNet algorithm, the residual neural network algorithms also include DenseNet and DarkNet. The main differences between these algorithms are in terms of shortcut and residual block. In the deep residual network ResNet, a shortcut is added between every two blocks, while in the deep residual network DenseNet, a shortcut is added to each subsequent block. Therefore, the accuracy of DenseNet is further improved.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 1 Issue (2022)
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing