Boosting Convolutional Neural Networks Using a Bidirectional Fast Gated Recurrent Unit for Text Categorization

Boosting Convolutional Neural Networks Using a Bidirectional Fast Gated Recurrent Unit for Text Categorization

Assia Belherazem, Redouane Tlemsani
DOI: 10.4018/IJAIML.308815
Article PDF Download
Open access articles are freely available for download

Abstract

This paper proposes a hybrid text classification model that combines 1D CNN with a single Bidirectional Fast GRU (BiFaGRU) termed as CNN-BiFaGRU. Single convolution layer captures features through a kernel applying 128 filters which are slide over these embeds to find convolutions and drop entire 1D feature maps by using Spatial Dropout, combined vectors using Max-Pooling layer. Then, the Bidirectional CUDNNGRU block to extract temporal features, results of this layer is normalize by the Batch Normalization layer and transmitted to the Fully Connected Layer. The output layer produces the final classification results. Precision/loss score was used as the main criterion on five different datasets (WebKb, R8, R52, AG-News, and 20 NG) to assess the performance of the proposed model. The results indicate that the precision score of the classifier on WebKb, R8, and R52 data sets significantly improved from 90% up to 97% compared to the best result achieved by other methods such as LSTM and Bi-LSTM. Thus, the proposed model shows higher precision and lower loss scores than other methods.
Article Preview
Top

Introduction

This paper proposes an automatic approach to categorizing text using deep learning. Text categorization is the associating documents process to predefined classes (categories or labels) written in natural language using natural language processing (NLP). Many researchers have used text classification with deep learning architectures that assure high precision with less need for engineering features. The key aspect of deep learning is that the resultant layers of features are not designed by human engineers, but, rather, are learned from data using a general-purpose learning procedure.

In particular, the recurrent neural network (RNN) is a very powerful dynamic system and an important implementation mechanism of deep learning. The RNN method can find the dependencies relationship of time series that provide more effective ways for time memory to operate. Loop memory can extract valuable information from the history data through memory cell execution and other control mechanisms. The long short-term memory (LSTM) and gated recurrent units (GRUs) are two kinds of special memory cells of RNN that employ different memory cell mechanisms. LSTM and GRU networks use special hidden units whose natural function is remembering inputs for a long time (Hochreiter & Schmidhuber, 1997). However, regarding a load of power data with obvious time series and cycles characteristics, load forecasting can take advantage of history information via the LSTM and the GRU cell(Zhang,Wu et al., 2018).

This paper examines the multiclass automatic classification applying a hybrid approach by integrating convolution neural network (CNN) and bidirectional fast gated recurrent unit (BiFaGRU) termed as CNN-BiFaGRU. CNN-BiFaGRU is a supervised text classification testing on different textual databases (Reuters8, Reuters52, WebKB, 20NewsGroup, and AG NEWS) using the GloVe word embedding proposed by Pennington et al. (2014). The model is evaluated using different metrics such as accuracy, precision, recall, F1-score, and the confusion matrix. The obtained results are detailed in the section of results and discussion.

The main contributions of this paper are summarized as follows:

  • The implementation of a new model using a 1D CNN followed by a single bidirectional CuDNNGRU performs the classification.

  • The use of both CNN and Bi-CUDNNGRU maximizes the potential of the text representation with the capability to generate complex content sequences with minimal storage requirements.

  • Experiments on five commonly used datasets demonstrate that the proposed model yields remarkable computing time and precision performance with a low loss against state-of-the-art methods.

The rest of the paper is organized as follows: The second section represents the related work; the third section defines the proposed model, how it works, how the authors implemented the hybrid model and the implementation of other four models to compare them with their best model; The fourth section focuses on the problem statement, description of the datasets, and the exploratory data analysis and settings used to solve the problem; in addition, the fourth section explains in detail how the data were prepared and represented, which word emblems were used, how the dataset was divided for training and testing, and which evaluation criteria were used to assess the performance of the proposed model; The fifth section presents the results using various words embedding; the sixth section discusses the results obtained; finally, the seventh section concludes the paper.

Top

State Of The Art

Many approaches have been proposed in the past few years. Johnson and Zhang's (2015b)ConvNets model or char-level CNN applies only to a character. It can work in different languages (Johnson & Zhang, 2015a). The Kim's (2014)TextCNN model uses the Word2vec (Mikolov et al., 2013). This architecture is a variant Collobert et al.'s (2011)CNN architecture; it is training only on labeled data.

Complete Article List

Search this Journal:
Reset
Volume 13: 1 Issue (2024)
Volume 12: 2 Issues (2022)
Volume 11: 2 Issues (2021)
Volume 10: 2 Issues (2020)
Volume 9: 2 Issues (2019)
View Complete Journal Contents Listing