Comparing Deep Neural Networks and Gradient Boosting for Pneumonia Detection Using Chest X-Rays

Son Nguyen, Matthew Quinn, Alan Olinsky, John Quinn

Source Title: Biomedical and Business Applications Using Artificial Neural Networks and Machine Learning

DOI: 10.4018/978-1-7998-8455-2.ch003

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In recent years, with the development of computational power and the explosion of data available for analysis, deep neural networks, particularly convolutional neural networks, have emerged as one of the default models for image classification, outperforming most of the classical machine learning models in this task. On the other hand, gradient boosting, a classical model, has been widely used for tabular structure data and leading data competitions, such as those from Kaggle. In this study, the authors compare the performance of deep neural networks with gradient boosting models for detecting pneumonia using chest x-rays. The authors implement several popular architectures of deep neural networks, such as Resnet50, InceptionV3, Xception, and MobileNetV3, and variants of a gradient boosting model. The authors then evaluate these two classes of models in terms of prediction accuracy. The computation in this study is done using cloud computing services offered by Google Colab Pro.

Chapter Preview

Top

Introduction

In this chapter, the authors give an overview of deep neural networks, gradient boosting, and the problem of detecting pneumonia from chest x-rays. The main difference between the deep neural network models and the gradient boosting models is that deep neural networks are designed to handle image data while gradient boosting models often excel at tasks utilizing regular tabular data. Detailed structures of these models are provided below.

Neural Networks and Deep Learning

In parametric supervised learning, one wants to establish the relation between the input values and the output value by a specific function. To find this specific function (the solution), one searches in the class of functions identified by parameters (hence the term “parametric”) to find values for the parameters that minimize a predetermined objective function or loss function. For example, in linear regression, the least squares regression line is the solution found from a class of linear functions identified by the slope and intercept parameters. These values of the slope and intercept of the least squares regression line are those that minimize the squared loss.

A neural network represents a class of functions. The structure, or architecture, of a neural network, is the combination of several nodes and edges. The edges, or weights, are parameters of the functions. In neural networks, a linear combination of the nodes in one layer is input to a function, called an activation function, of a node in the next layer. The values in the first layer, or the input values, will go through this calculation process, followed by all the hidden layers to ultimately produce the output value in the output layer. Training a neural network entails finding the set of parameters or weights that minimizes a predetermined loss function.

It is easy to see that the linear model is also a neural network with no hidden layers. That is, a linear model effectively takes a linear combination of the input variables to directly produce an output value in a single step. However, including hidden layers increases the flexibility of the neural network, which can help it model more complex scenarios. In fact, it has been shown that any given continuous function can be approximated to any desired precision by a neural network with only a single hidden layer (Nielsen, 2016).

Deep learning models, or deep neural networks, are neural network models with multiple hidden layers (Schmidhuber, 2015). The word “deep” indicates there are many hidden layers. In the past decades, with the explosion in the amount of data available and the rapid development in computing hardware, deep neural networks have been very successful and are usually the default models for tasks in computer vision, such as image recognition and object detection. The initial success of deep neural networks dates back to 2012 when the AlexNet model (Krizhevsky et al., 2012) won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with a top-5 error of 15.3%, which was 10.8 percentage points lower than the runner up. Since then, deep neural networks have shown state-of-the-art accuracy on the ImageNet competition (He et al., 2016).

Figure 1.

Architecture of a neural network. [The circles are neurons. The connections between two neurons are the edges, or weights, which are parameters of the neural network. The first layer is the input layer, the last layer is the output layer, and the layers between the input layer and output layer are hidden layers.]

Figure 2.

Values flow from one layer to a neuron of the next layer (say, layer 2). å represents the linear combination of x₁,x₂,x₃ with the corresponding coefficients or weights w₁,w₂,w₃. f is the activation function used at the neuron.

Key Terms in this Chapter

Learning Rate: This concept appears in several machine learning models. In neural network models, the learning rate is the fraction of the gradient vector that the weight vectors descend. The learning rate affects the speed and convergence of the training process. In gradient boosting, the learning rate decides how quickly or slowly the next tree corrects the error of the current tree. A learning rate that is too small might lead to a long training time, while a learning rate that is too large might not lead to convergence.

Stump: A decision tree with only one split or two leaves. This is the simplest decision tree that can be used as a weak learner in boosting algorithms.

Epoch: When training a neural network, a set of data points or images are passed forward and backward sequentially. One epoch is completed when all the data points are passed forward and backward through the neural network.

Backpropagation: Widely used in training neural networks, backpropagation is an algorithm to compute the gradient of the loss function of a neural network. Backpropagation computes the gradient of the loss function by using the chain rule. The gradient is computed one layer at a time and is iterated backward from the last layer.

Batch: A batch is a subset of the data. When training a neural network, the entire dataset is divided into batches. Batch size refers to the number of data points (or images) in a batch.

Transfer Learning: Transfer learning refers to a set of techniques that can store the knowledge acquired from learning one problem (dataset) to another problem. For instance, a model that can recognize different types of housecats may be useful to train a model to recognize different types of lions. Transfer learning can also be useful in the case where outdated data needs to be updated (Pan, 2009 AU36: The in-text citation "Pan, 2009" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Gradient Descent: Gradient descent is an algorithm to find a local minimum of a differentiable function. The algorithm starts by initiating a guess and then iteratively improves that guess by moving in the opposite direction of the gradient of the function. The algorithm stops when the gradient is zero, which means the current position is at a local minimum.

Data Augmentation: Data augmentation refers to a set of techniques that increases the size or improves the quality of the training data or images. Data augmentation could include multiple transformations of each input image in the training data. These transformations could be reflections, different rotations, or cropping of the original images. Data augmentation could also include artificially produced images that are generated from the input data.

Activation Functions: In neural network architecture, these are the pre-determined functions at each node or neuron, determining the output value of the node or neuron. These inputs to these functions are the linear combinations of the values at each node in the previous layer. Activation functions are often continuous functions. However, some activation functions are not differentiable everywhere, such as the Rectified Linear Units (ReLU) function, which is not differentiable at 0.

Convolutional Neural Network: A neural network where the hidden layers include layers that perform convolutional operations. The convolutional layers are usually followed by pooling layers and dense layers. At the convolutional layers, a filter (a vector or a multi-dimension tensor) will slide through the neurons to take the inner product with these neurons to provide values for the next layer.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference