A Review of Capsule Network Limitations, Modifications, and Applications in Object Recognition

A Review of Capsule Network Limitations, Modifications, and Applications in Object Recognition

Copyright: © 2024 |Pages: 25
DOI: 10.4018/979-8-3693-2913-9.ch005
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

Modern computer vision and machine learning technologies have enabled numerous advances in a variety of domains, including pattern recognition and image classification. One of the most powerful machine learning methods is the capsule network, which encodes features based on their hierarchical relationships. A capsule network is a sort of neural network that uses inverted graphics to represent an item in distinct sections and see the existing link between these pieces, as opposed to CNNs, which lose most of the evidence relating to spatial placement and require a large amount of training data. As a result, the authors give a comparison of various capsule network designs utilized in diverse applications. The fundamental contribution of this study is that it summarizes and discusses the major current published capsule network topologies, including their advantages, limits, modifications, and applications.
Chapter Preview
Top

Introduction

The capsule network neural architectures are a sort of artificial neural network found in machine learning systems (Khan et al., 2023). It is especially noticeable when describing a hierarchical connection and closely resembling biological neural networks (Haq et al., 2019). The capsules network's development is based on the concept of expanding the convolution network (Haq et al., 2023) in order to reuse the end results in order to uncover more consistent and advanced exemplification of the developing capsules. The capsule network has been designed as an alternative for the convolutional neural networks, as the CNN shows few limitations in accomplishing the applications of computer vision despite its efforts in managing the accuracy in the areas where it is applied, as it is a novel architecture in neural networks and an enhanced approach of the prevailing neural network model, particularly for the tasks in computer vision (Ahmad & Adnan, 2015; Ahmad et al., 2018; Anwar, Wang, Khan et al, 2020; Campus, n.d.). Convolutional neural networks, which are defined as the foundation of image processing in a deep learning context (Hosni et al., 2018; Munawar et al., n.d.; Rahim, Zhong, Ahmad, Ahmad, & ElAffendi, 2023; Sohail et al., 2023), were initially developed with the goal of classifying images by utilizing consecutive convolution layers and pooling layers (Anwar, Wang, Ahmad et al, 2020). Despite its ability to achieve accuracy, the convolution neural network caused some performance degradation due to the reduction in the data dimension for acquiring spatial invariance, resulting in a loss of information (location, rotation, various features related to scale and position) that may be required in the process of segmentation, and proper object localization (Fatima et al., 2022). This makes segmentation and detection more difficult (Patrick et al., 2022). The alternative techniques, employing the end to end connected layer (Haq et al., 2024) and utilizing reinforcement learning (Krizhevsky et al., 2012) developing advanced training and designing techniques for the convolutional neural network (Ullah et al., n.d.) to reduce the difficulties in the process of segmentation and detection, to gain accuracy in the classification of the images, were tedious but did not show any improvements (Tahsin et al., 2023), leading to the development of the new convolutional neural network architecture. Geoffrey Hinton developed this approach as a solution to the shortcomings of the convolutional neural network. Figure 1 depicts a conventional convolutional neural network. The input image is scanned in the convolutional layer in order to extract low-level features like edges. To reduce computing complexity and make the model more nonlinear, utilize the RELU function. Down-sampling, or pooling layer, is a technique used to save memory and identify the same object in several images. Different types of pooling, such as max pooling, min pooling, average pooling, and sum pooling, are utilized depending on the requirements. These pooling methods are shown in Figure 2, and the ReLU activation methodology is shown in Figure 3.

Figure 1.

CNN structure (Anwar, Wang, Ahmad et al, 2020)

979-8-3693-2913-9.ch005.f01
Figure 2.

Pooling techniques

979-8-3693-2913-9.ch005.f02
Figure 3.

ReLU function of CNN

979-8-3693-2913-9.ch005.f03

Complete Chapter List

Search this Book:
Reset