Reverse Pyramid Attention Guidance Network for Person Re-Identification

Reverse Pyramid Attention Guidance Network for Person Re-Identification

Jiang Liu, Wei Bai, Yun Hui
DOI: 10.4018/IJCINI.349982
Article PDF Download
Open access articles are freely available for download

Abstract

Person re-identification aims to retrieve pedestrians with the same identity across different cameras. However, current methods increase attention to interfering regions when dealing with complex backgrounds and occlusion, especially in the presence of similar interfering features. To enhance the robustness of the model, we propose the Reverse Pyramid Attention Guidance (RPAG) network, using a reverse pyramid structure to learn features at multiple granularities. To mitigate the impact of occlusion, we introduce the Similar Feature Filtering (SFF) attention module at the pixel level, using graph convolution to adaptively select occluded regions, thereby enhancing retrieval accuracy by filtering out irrelevant parts. Combining the reverse pyramid structure with the pixel-level attention module strengthens adaptability to complex scenes, guides multi-granularity feature learning, and effectively handles various occlusion scenarios. RPAG achieved Rank-1 accuracies of 96.2%, 93.2%, 88.7%, and 73.2% on the Market1501, DukeMTMC-ReID, MSMT17, and Occluded-Duke datasets, respectively.
Article Preview
Top

Introduction

Person re-identification (Re-ID) is one of the primary research topics in computer vision (Chen & Wang, 2023; Chong, 2023). It has significant importance in modern applications, such as surveillance, tracking, and intelligent retail, and has been attracting widespread attention (Ming et al., 2022). Despite numerous approaches aimed at addressing challenges related to occlusion and complex backgrounds in person Re-ID, current models, when faced with occlusion or complex backgrounds that exhibit features resembling those of the target person, often focus too much on obstructive areas, leading to a degradation in performance.

Current Re-ID approaches focus on learning local features better to enhance feature representation capabilities (Ming et al., 2021; Somers et al., 2023). However, a major challenge in Re-ID is the interference of complex backgrounds or occlusion with the model. Learning based solely on single-scale local features cannot avoid interference from backgrounds and occlusion, and common methods that use horizontally segmented feature maps (Sun et al., 2018) further disrupt the overall integrity of features. Although some methods attempt to model relationships between features by increasing kernel sizes (Li et al., 2021) or stacking convolutional layers (Chen et al., 2021), these methods are still constrained by the limitations of receptive fields, making the achievement of superior results challenging.

In addition, non-local methods (Wang et al., 2018) consider the similarity between feature nodes, allowing information or the activation values of feature nodes to be transmitted among similar nodes. This approach partially overcomes the long-range dependencies between features. However, methods based on feature similarity often suffer from the influence of similar interfering features. For instance, when complex backgrounds or occlusions prevent the extraction of complete features, these interfering features, which share similarities with the target features, may also be activated, thereby affecting the model’s performance. As a consequence, such methods still cannot effectively address the interference from backgrounds or occlusions (i.e., the target pedestrian being obscured by other pedestrians or objects). Therefore, a new method is urgently needed to thoroughly explore the relationships between features and optimize the extracted discriminative features, enabling the model to maintain high robustness and accuracy under complex background and occlusion conditions.

To address this issue, we conducted an in-depth study of relationships between features and propose an effective method to weaken connections between similar features of different categories. We introduce a universal similar feature filtering (SFF) attention module. In essence, the SFF attention module, which is based on graph convolution, optimizes the relationship network within a topological graph with features as nodes. First, a topological graph is constructed with pixel-level features as nodes and similarities as edges. Then, by decomposing the graph, the graph signals are mapped onto basis coordinates in an N-dimensional space to obtain activation components for each basis, and adaptive filtering is applied to partial activation components. Finally, the topological graph is reconstructed to obtain the filtered relationship network. The SFF attention module simultaneously calculates spatial attention and channel attention, enabling the model to adaptively select pedestrian features to reduce interference from occlusion or background and enhancing or weakening the fusion of semantic information adaptively.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 1 Issue (2022)
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing