A Lightweight Real-Time System for Object Detection in Enterprise Information Systems for Frequency-Based Feature Separation

A Lightweight Real-Time System for Object Detection in Enterprise Information Systems for Frequency-Based Feature Separation

YiHeng Wu, JianXin Chen
Copyright: © 2023 |Pages: 18
DOI: 10.4018/IJSWIS.330015
Article PDF Download
Open access articles are freely available for download

Abstract

In the domain of target detection in mobile and embedded devices, neural network model inference speed is a crucial metric. This paper introduces YOLO-FLNet, a lightweight algorithm for detecting people in open scenes. The model utilizes the DFEM structure to capture and process high-frequency and low-frequency information in the feature map. Additionally, the VoV-DFEM structure, based on the concept of one-shot aggregation, enhances feature aggregation from different scales and frequencies in the backbone network. To validate its performance, experiments were conducted using publicly available datasets on a computer with dedicated GPUs. As a result, compared to YOLOv7-tiny, YOLO-FLNet achieved a 0.3% mAP@0.5 improvement, reduced parameter size by 52.9%, and increased inference speed by 30.2%. These characteristics make it valuable for person detection in engineering domains, providing theoretical guidance for lightweight models in edge computing.
Article Preview
Top

Backbone

In recent years, researchers have made significant advancements in the field of lightweight backbones aimed at reducing the parameter count and improving the detection accuracy and inference efficiency of neural network models. Important considerations among these advancements include model size reduction, memory access cost (MAC), and the computational efficiency of the GPU. Taking the impact of MAC on model performance into account, Lee et al. (2019) proposed VoVNet, which adopts the OSA concept. VoVNet achieves a speed twice as fast as densely connected convolutional networks and reduces energy consumption by over 1.5 times. Han et al. (2020) observed the feature maps generated by residual structures and found that many of them are redundant and can be obtained through linear transformations. Consequently, they proposed the ghost convolution module, which can selectively process intrinsic feature maps and ghost feature maps differently. By partitioning the features along the channel dimension into different branches, the ghost convolution module effectively reduces the parameter size, improves computational efficiency, and maintains the module’s feature extraction capability. It has been proven to achieve outstanding performance in model lightweighting and runtime efficiency (Ma et al., 2023; Xu et al., 2022; Zhao et al., 2021). Jiang et al. (2022) discovered that when the spatial information of high-resolution feature maps interacts effectively with the semantic information in low-resolution feature maps, even models with extremely lightweight backbone networks can achieve good detection performance on the COCO dataset. To strike a balance between more robust learning capability during training and higher efficiency during inference, Ding et al. (2021) applied the re-parameterization technique in RepVGG to adjust the model structure. This technique allows RepVGG to maintain higher accuracy while achieving an inference speed that is 83% faster than the classic deep residual network. To lightweight the model and enhance its robustness against interference, Zeng et al. (2022) designed the improved dense dilated convolution (IDDC) block in the network structure. Their proposed LDSNet limits the parameter size to within one million while maintaining high accuracy. In their research, Huang et al. (2022) proposed the lightweight oriented object detector (LO-Det) and dynamic receptive field (DRF) to improve the detection performance of the model. The CSA-DRF component exhibited good efficiency and accuracy in their experiments. Mehta and Rastegari (2021) introduced the MobileViT block, a lightweight universal transformer structure suitable for mobile devices. As a backbone, it reduces the parameter size by over 90% compared to ResNet-101. These creative efforts have yielded practical achievements in various fields. However, the decrease in feature extraction capability during the construction of lightweight models is a problem that cannot be ignored. Additionally, when feature maps are propagated in network models, they always carry specific frequency information. However, none of the abovementioned methods have specialized structures to extract such frequency-based information. In this research, a method is proposed to fuse features with different information frequencies in the feature maps to minimize the loss in precision.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing