Article Preview
TopIntroduction
Pedestrian detection is a critical concern within computer vision, serving a pivotal function in multiple applications like autonomous driving, surveillance systems, and intelligent traffic management (Khan et al., 2023; Zarei et al., 2023; Zuo et al., 2022). However, complex environments and crowded pedestrian scenes pose significant challenges to the accuracy of traditional pedestrian detection methods. These challenges include factors such as occlusion, pedestrians at different scales, pose variations, and lighting changes (S. Li et al., 2022), all of which contribute to the poor accuracy in pedestrian detection. In recent years, research on semantic networks (Hu et al., 2022) has played a positive role in various fields (Mishra et al., 2021; Mishra et al., 2022; Nguyen et al., 2021) of deep learning (F. Li et al., 2022), including detection tasks (Cvitić et al., 2021; Guendouz & Amine, 2022; Ling & Hao, 2022; Zhang et al., 2023). However, research in the field of computer vision (Pathoee et al., 2022) remains insufficient. This paper aims to explore the potential role of semantic networks in pedestrian target detection.
The occlusion problem in crowded scenes is highly complex, as pedestrians can be partially or fully occluded by other pedestrians or objects, leading to the overlap or absence of local features (Q. Li et al., 2022; Tang et al., 2023; P. Zhou et al., 2020). For this issue, finding a balance point using the traditional non-maximum suppression (NMS) method is challenging. Lowering the NMS threshold introduces more false positive results, while increasing it suppresses some true positive results. Although P. Zhou et al. (2020) and Tang et al. (2023) proposed methods to optimize NMS for better adaptation to the overlapping situations among different pedestrian instances in crowded scenes, NMS as a dense method requires substantial computational resources. S. Li et al. (2022), on the other hand, addressed multi-occlusion problems by designing attention modules and feature fusion methods, which somewhat reduced the complexity of the model. However, the algorithm of the attention mechanism was overly simplistic, leading to a disconnect between pixel-level features. For pedestrian detection tasks, the main issue with existing methods lies in the insufficient consideration of global feature information (Liu et al., 2023). Although some methods attempt to reduce interference from overlapping feature regions by optimizing NMS thresholds (H. He et al., 2023), there still remains an inherent problem that cannot be completely avoided. Therefore, these methods continue to face challenges in accuracy and robustness when dealing with pedestrian detection in complex scenes. Additionally, these methods primarily focus on optimizing NMS thresholds, lacking the ability to capture inter-feature relationships, resulting in a lack of comprehensive consideration of global contextual information in complex scenarios. Hence, there is a need for an approach that can simultaneously optimize the feature extraction process, reduce the impact of overlapping feature regions on model performance, and establish relationships between features on a global scale to enhance the accuracy and robustness of pedestrian detection.