YOLO-DCNet: A Semantic-Based Novel Flexible Lightweight Human Detection Algorithm

YOLO-DCNet: A Semantic-Based Novel Flexible Lightweight Human Detection Algorithm

YiHeng Wu, Jiaqiang Dong, JianXin Chen
Copyright: © 2024 |Pages: 23
DOI: 10.4018/IJSWIS.339000
Article PDF Download
Open access articles are freely available for download

Abstract

Enhanced processors empower edge devices like smartphones for human detection, yet their application is constrained by algorithmic efficiency and precision. This paper introduces YOLO-DCNet, a lightweight neural network detector built upon YOLOv7-tiny. Incorporating a dynamic multi-head structural re-parameterization (DMSR) module within its backbone network enables effective processing of the features utilized in the model. To improve multi-scale feature aggregation, the model integrates a channel information compression and linear mapping (CLM) module into its feature pyramid architecture. Moreover, the optimization of training and inference performance is achieved by employing RepVGG blocks between the main computational modules of the model. Experimental data reveal that the enhanced YOLOv7-tiny model achieves a 31.7% faster inference speed and marginal gains of 0.7% in mAP@0.5 and 0.5% in mAP@0.5:0.95 over the original. This underscores the model's improved performance and applicability for real-time human detection on edge devices across diverse applications.
Article Preview
Top

Ghost Module

In object detection using deep neural networks, the preprocessed image data is initially fed into the backbone network of the model for feature extraction. To improve the efficiency of this process, researchers have made significant advancements in exploring lightweight backbones. Han et al. (2020) visualized feature maps in convolutional neural networks and observed that there were redundant feature maps. They discovered that these redundant feature maps could be obtained through relatively cheaper operations, such as linear transformations or depth-wise separable convolutions. In the processing of their proposed ghost module, as depicted in Figure 1, a small number of convolutional kernels are initially employed to extract features from the input feature map, as illustrated by the brightly colored portion in the figure. Subsequently, the obtained features undergo additional cheaper computations, represented by the symbol Ф in the figure, such as linear transformations or depth-wise separable convolutions, resulting in the output shown in the grayscale portion of the figure pointed to by the arrows. Finally, the features obtained from these two processes are concatenated to generate the final output feature map. In their experiments, the Ghost-VGG-16 model, which incorporated this design, achieved the highest accuracy (93.7%) compared to other models, with a substantial reduction in floating point operations (FLOPs).

Figure 1.

An illustration of the ghost module. Ф represents the inexpensive operation

IJSWIS.339000.f01

Building upon this approach, Deng et al. (2022) applied the ghost module to the classic single-stage object detection algorithm, reducing computational costs and improving detection performance. Similarly, Kong et al. (2022) reduced the model size to 1/6 of its original size and improved mean average precision (mAP) by 2.9%. To enhance the detection speed of the model, Cao et al. (2022) introduced the GhostNet module, and their proposed model, GhostNet-YOLOv5, achieved a 4.83% increase in precision. Considering memory and computational limitations of mobile devices, Han et al. (2022) proposed the CPU-efficient Ghost (C-Ghost) module and GPU-efficient Ghost (G-Ghost), which better balance accuracy and speed in models deployed on heterogeneous devices, including CPU and GPU. Furthermore, to address challenges posed by variations in target shapes, occlusions, and complex backgrounds in practical production applications, researchers (Qiu et al., 2022; H. Wang et al., 2022; Wei et al., 2022) implemented efficient processing of redundant information in channel features based on the ghost module. In their respective domains, these detectors exhibited significant improvements in terms of recall, precision, and mAP.

Notwithstanding the incremental advances in computer vision, the integration of efficient memory access cost (MAC) design with multi-scale edge detection methodologies has practical value, particularly within the ambit of the head part of edge detectors—a critical juncture for optimizing performance across scales. In light of this research lacuna, this paper presents a sophisticated design innovation for the head part of the model, which deftly incorporates the principles of GhostNet. This integration is meticulously crafted to process information from feature maps at varying scales with superior efficacy.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing