Article Preview
TopIntroduction
Person re-identification (Re-ID) is one of the primary research topics in computer vision (Chen & Wang, 2023; Chong, 2023). It has significant importance in modern applications, such as surveillance, tracking, and intelligent retail, and has been attracting widespread attention (Ming et al., 2022). Despite numerous approaches aimed at addressing challenges related to occlusion and complex backgrounds in person Re-ID, current models, when faced with occlusion or complex backgrounds that exhibit features resembling those of the target person, often focus too much on obstructive areas, leading to a degradation in performance.
Current Re-ID approaches focus on learning local features better to enhance feature representation capabilities (Ming et al., 2021; Somers et al., 2023). However, a major challenge in Re-ID is the interference of complex backgrounds or occlusion with the model. Learning based solely on single-scale local features cannot avoid interference from backgrounds and occlusion, and common methods that use horizontally segmented feature maps (Sun et al., 2018) further disrupt the overall integrity of features. Although some methods attempt to model relationships between features by increasing kernel sizes (Li et al., 2021) or stacking convolutional layers (Chen et al., 2021), these methods are still constrained by the limitations of receptive fields, making the achievement of superior results challenging.
In addition, non-local methods (Wang et al., 2018) consider the similarity between feature nodes, allowing information or the activation values of feature nodes to be transmitted among similar nodes. This approach partially overcomes the long-range dependencies between features. However, methods based on feature similarity often suffer from the influence of similar interfering features. For instance, when complex backgrounds or occlusions prevent the extraction of complete features, these interfering features, which share similarities with the target features, may also be activated, thereby affecting the model’s performance. As a consequence, such methods still cannot effectively address the interference from backgrounds or occlusions (i.e., the target pedestrian being obscured by other pedestrians or objects). Therefore, a new method is urgently needed to thoroughly explore the relationships between features and optimize the extracted discriminative features, enabling the model to maintain high robustness and accuracy under complex background and occlusion conditions.
To address this issue, we conducted an in-depth study of relationships between features and propose an effective method to weaken connections between similar features of different categories. We introduce a universal similar feature filtering (SFF) attention module. In essence, the SFF attention module, which is based on graph convolution, optimizes the relationship network within a topological graph with features as nodes. First, a topological graph is constructed with pixel-level features as nodes and similarities as edges. Then, by decomposing the graph, the graph signals are mapped onto basis coordinates in an N-dimensional space to obtain activation components for each basis, and adaptive filtering is applied to partial activation components. Finally, the topological graph is reconstructed to obtain the filtered relationship network. The SFF attention module simultaneously calculates spatial attention and channel attention, enabling the model to adaptively select pedestrian features to reduce interference from occlusion or background and enhancing or weakening the fusion of semantic information adaptively.