Ghost Module
In object detection using deep neural networks, the preprocessed image data is initially fed into the backbone network of the model for feature extraction. To improve the efficiency of this process, researchers have made significant advancements in exploring lightweight backbones. Han et al. (2020) visualized feature maps in convolutional neural networks and observed that there were redundant feature maps. They discovered that these redundant feature maps could be obtained through relatively cheaper operations, such as linear transformations or depth-wise separable convolutions. In the processing of their proposed ghost module, as depicted in Figure 1, a small number of convolutional kernels are initially employed to extract features from the input feature map, as illustrated by the brightly colored portion in the figure. Subsequently, the obtained features undergo additional cheaper computations, represented by the symbol Ф in the figure, such as linear transformations or depth-wise separable convolutions, resulting in the output shown in the grayscale portion of the figure pointed to by the arrows. Finally, the features obtained from these two processes are concatenated to generate the final output feature map. In their experiments, the Ghost-VGG-16 model, which incorporated this design, achieved the highest accuracy (93.7%) compared to other models, with a substantial reduction in floating point operations (FLOPs).
Figure 1. An illustration of the ghost module. Ф represents the inexpensive operation
Building upon this approach, Deng et al. (2022) applied the ghost module to the classic single-stage object detection algorithm, reducing computational costs and improving detection performance. Similarly, Kong et al. (2022) reduced the model size to 1/6 of its original size and improved mean average precision (mAP) by 2.9%. To enhance the detection speed of the model, Cao et al. (2022) introduced the GhostNet module, and their proposed model, GhostNet-YOLOv5, achieved a 4.83% increase in precision. Considering memory and computational limitations of mobile devices, Han et al. (2022) proposed the CPU-efficient Ghost (C-Ghost) module and GPU-efficient Ghost (G-Ghost), which better balance accuracy and speed in models deployed on heterogeneous devices, including CPU and GPU. Furthermore, to address challenges posed by variations in target shapes, occlusions, and complex backgrounds in practical production applications, researchers (Qiu et al., 2022; H. Wang et al., 2022; Wei et al., 2022) implemented efficient processing of redundant information in channel features based on the ghost module. In their respective domains, these detectors exhibited significant improvements in terms of recall, precision, and mAP.
Notwithstanding the incremental advances in computer vision, the integration of efficient memory access cost (MAC) design with multi-scale edge detection methodologies has practical value, particularly within the ambit of the head part of edge detectors—a critical juncture for optimizing performance across scales. In light of this research lacuna, this paper presents a sophisticated design innovation for the head part of the model, which deftly incorporates the principles of GhostNet. This integration is meticulously crafted to process information from feature maps at varying scales with superior efficacy.