Article Preview
TopIntroduction
Deep learning is a research hot spot at present, and its research results have been widely used in image recognition, speech recognition, and other fields. This paper concerns the study of the application of deep neural network algorithm in image classification. An improved neural network model based on GoogLeNet and residual neural network ResNet is proposed. First, the mainstream algorithms of deep neural network in image recognition is introduced in this paper and the analysis of the advantages and disadvantages of GoogLeNet series algorithm and residual neural network algorithm. Based on the advantages of the two kinds of algorithms, an improved model GRSN is proposed by introducing the shortcut into the GoogLeNet inception block algorithm. Secondly, the detailed network structure and parameter setting of GRSN is shown. In addition, the experimental analysis is carried out. Then, the over-fitting situation is optimized. Finally, the conclusion and prospect are given.
The BP neural network proposed by Rumelhart and McClelland in 1986 is a fully connected network, but this network cannot provide great performance or high accuracy in image recognition. To address this problem, the CNN (Convolution Neural Network) was proposed, which imitates the higher sensitivity to main features of human eyes and adopts a shared weight structure to significantly reduce the weight parameters and improve the performance and accuracy of image recognition. In 1998, LeCun proposed the LeNet, which is a typical CNN structure, and it consists of two convolution layers, two pooling layers, and two fully-connected layers. When this network was applied to the handwritten digit recognition of people in the United States, the error rate was lower than 1%. However, its recognition performances on Chinese characters and more complex images need to be enhanced. In 2012, the AlexNet developed by Hinton and his student Alex Krizhevsky won the International Conference on Computer Vision (ICCV) contest. Their algorithm adds more layers and adopts a parallel structure to run on GPU, so its image recognition performance is significantly improved. VGG uses small convolution kernels to increase the numbers of layers and channels of the network (Simonyan, 2014). The GoogLeNet algorithm won the championship in the same year. In this algorithm, the inception structure block is introduced, and the convolution kernels of different sizes are used in the same layer of the network to extract a feature of different sizes and improve the perception of the model. By introducing the 1*1 convolution kernels, dimension reduction can also be achieved. There are four versions of GoogLeNet: v1, v2, v3 and v4, and the main difference between these versions is in the inception module.
The recognition accuracy of the algorithm can be improved by increasing the number of network layers. LeNet has 5 layers, AlexNet has 8 layers, VGG has 16 layers and 19 layers structure, and inception v1 has 22 layers. However, He Kaiming, the designer of ResNet, conducted an experiment on the cifar10 data set and found that the image recognition error rate of a 56-layer convolution network was higher than that of a 20-layer convolution network (He et al., 2016). Therefore, it will lead to the degradation in the performance of the neural network model by simply increasing the network layers, causing the later features to lose the original information of previous features. To address this problem, he proposed adding a bypass between the two blocks. In addition to the ResNet algorithm, the residual neural network algorithms also include DenseNet and DarkNet. The main differences between these algorithms are in terms of shortcut and residual block. In the deep residual network ResNet, a shortcut is added between every two blocks, while in the deep residual network DenseNet, a shortcut is added to each subsequent block. Therefore, the accuracy of DenseNet is further improved.