Differential Feature Fusion, Triplet Global Attention, and Web Semantic for Pedestrian Detection

Differential Feature Fusion, Triplet Global Attention, and Web Semantic for Pedestrian Detection

Sha Tao, Zhenfeng Wang
Copyright: © 2024 |Pages: 18
DOI: 10.4018/IJSWIS.345651
Article PDF Download
Open access articles are freely available for download

Abstract

In complex environments and crowded pedestrian scenes, the overlap or loss of local features is a pressing issue. However, existing methods often struggle to strike a balance between eliminating interfering features and establishing feature connections. To address this challenge, we introduce a novel pedestrian detection approach called Differential Feature Fusion under Triplet Global Attention (DFFTGA). This method merges feature maps of the same size from different stages to introduce richer feature information. Specifically, we introduce a pixel-level Triplet Global Attention (TGA) module to enhance feature representation and perceptual range. Additionally, we introduce a Differential Feature Fusion (DFF) module, which optimizes features between similar nodes for filtering. This series of operations helps the model focus more on discriminative features, ultimately improving pedestrian detection performance. Compared to benchmarks, we achieve significant improvements and demonstrate outstanding performance on datasets such as CityPersons and CrowdHuman.
Article Preview
Top

Introduction

Pedestrian detection is a critical concern within computer vision, serving a pivotal function in multiple applications like autonomous driving, surveillance systems, and intelligent traffic management (Khan et al., 2023; Zarei et al., 2023; Zuo et al., 2022). However, complex environments and crowded pedestrian scenes pose significant challenges to the accuracy of traditional pedestrian detection methods. These challenges include factors such as occlusion, pedestrians at different scales, pose variations, and lighting changes (S. Li et al., 2022), all of which contribute to the poor accuracy in pedestrian detection. In recent years, research on semantic networks (Hu et al., 2022) has played a positive role in various fields (Mishra et al., 2021; Mishra et al., 2022; Nguyen et al., 2021) of deep learning (F. Li et al., 2022), including detection tasks (Cvitić et al., 2021; Guendouz & Amine, 2022; Ling & Hao, 2022; Zhang et al., 2023). However, research in the field of computer vision (Pathoee et al., 2022) remains insufficient. This paper aims to explore the potential role of semantic networks in pedestrian target detection.

The occlusion problem in crowded scenes is highly complex, as pedestrians can be partially or fully occluded by other pedestrians or objects, leading to the overlap or absence of local features (Q. Li et al., 2022; Tang et al., 2023; P. Zhou et al., 2020). For this issue, finding a balance point using the traditional non-maximum suppression (NMS) method is challenging. Lowering the NMS threshold introduces more false positive results, while increasing it suppresses some true positive results. Although P. Zhou et al. (2020) and Tang et al. (2023) proposed methods to optimize NMS for better adaptation to the overlapping situations among different pedestrian instances in crowded scenes, NMS as a dense method requires substantial computational resources. S. Li et al. (2022), on the other hand, addressed multi-occlusion problems by designing attention modules and feature fusion methods, which somewhat reduced the complexity of the model. However, the algorithm of the attention mechanism was overly simplistic, leading to a disconnect between pixel-level features. For pedestrian detection tasks, the main issue with existing methods lies in the insufficient consideration of global feature information (Liu et al., 2023). Although some methods attempt to reduce interference from overlapping feature regions by optimizing NMS thresholds (H. He et al., 2023), there still remains an inherent problem that cannot be completely avoided. Therefore, these methods continue to face challenges in accuracy and robustness when dealing with pedestrian detection in complex scenes. Additionally, these methods primarily focus on optimizing NMS thresholds, lacking the ability to capture inter-feature relationships, resulting in a lack of comprehensive consideration of global contextual information in complex scenarios. Hence, there is a need for an approach that can simultaneously optimize the feature extraction process, reduce the impact of overlapping feature regions on model performance, and establish relationships between features on a global scale to enhance the accuracy and robustness of pedestrian detection.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing