Backbone
In recent years, researchers have made significant advancements in the field of lightweight backbones aimed at reducing the parameter count and improving the detection accuracy and inference efficiency of neural network models. Important considerations among these advancements include model size reduction, memory access cost (MAC), and the computational efficiency of the GPU. Taking the impact of MAC on model performance into account, Lee et al. (2019) proposed VoVNet, which adopts the OSA concept. VoVNet achieves a speed twice as fast as densely connected convolutional networks and reduces energy consumption by over 1.5 times. Han et al. (2020) observed the feature maps generated by residual structures and found that many of them are redundant and can be obtained through linear transformations. Consequently, they proposed the ghost convolution module, which can selectively process intrinsic feature maps and ghost feature maps differently. By partitioning the features along the channel dimension into different branches, the ghost convolution module effectively reduces the parameter size, improves computational efficiency, and maintains the module’s feature extraction capability. It has been proven to achieve outstanding performance in model lightweighting and runtime efficiency (Ma et al., 2023; Xu et al., 2022; Zhao et al., 2021). Jiang et al. (2022) discovered that when the spatial information of high-resolution feature maps interacts effectively with the semantic information in low-resolution feature maps, even models with extremely lightweight backbone networks can achieve good detection performance on the COCO dataset. To strike a balance between more robust learning capability during training and higher efficiency during inference, Ding et al. (2021) applied the re-parameterization technique in RepVGG to adjust the model structure. This technique allows RepVGG to maintain higher accuracy while achieving an inference speed that is 83% faster than the classic deep residual network. To lightweight the model and enhance its robustness against interference, Zeng et al. (2022) designed the improved dense dilated convolution (IDDC) block in the network structure. Their proposed LDSNet limits the parameter size to within one million while maintaining high accuracy. In their research, Huang et al. (2022) proposed the lightweight oriented object detector (LO-Det) and dynamic receptive field (DRF) to improve the detection performance of the model. The CSA-DRF component exhibited good efficiency and accuracy in their experiments. Mehta and Rastegari (2021) introduced the MobileViT block, a lightweight universal transformer structure suitable for mobile devices. As a backbone, it reduces the parameter size by over 90% compared to ResNet-101. These creative efforts have yielded practical achievements in various fields. However, the decrease in feature extraction capability during the construction of lightweight models is a problem that cannot be ignored. Additionally, when feature maps are propagated in network models, they always carry specific frequency information. However, none of the abovementioned methods have specialized structures to extract such frequency-based information. In this research, a method is proposed to fuse features with different information frequencies in the feature maps to minimize the loss in precision.