In previous works, the channel attention mechanism has been widely used in person re-identification. However, the channel attention mechanism completely compresses the spatial dimension during calculation, which harms the diversity of the channel information over different pixels. In this paper, a channel convolution residual block is proposed for more detailed inter-channel correlation modeling. First, we preserve spatial context information when introducing the channel dependency, which enables pixel-wise inter-channel correlation modeling. At the same time, a bottleneck strategy is used to reduce parameters in the spatial dimension. Second, the channel convolution instead of the fully connected layer is employed to reduce the parameters in the channel dimension. In addition, the inter-channel correlation is merged into the backbone network directly in the form of residual, and thus the block can be embedded in any deep neural networks. Experiments on Market1501 and DukeMTMC-ReID datasets demonstrate that the channel convolution residual block improves the accuracy of person re-identification task effectively.
Spatial attention mechanism is widely used to extract local feature in person re-identification. However, some existing multi-stage spatial attention structures lack flexibility and require complicated training process. In this paper, a plug-and-play LSTM-based Attention Module(LAM) is proposed to enhance flexibility of the multi-attention mechanism. First, we employ the single-stage multi-attention structure to replace the traditional multi-stage multi-attention structure. Our structure encapsulates multiple attention machines in single module and thus the module can be added to any backbone networks without any modification directly. Then, correlation is introduced to spatial attention machines through LSTM. Correlation between different attention machines preserves diversity of the local feature and exploit the capacity of multi-attention mechanism. Moreover, the LAM is added to the backbone network in the form of residual, which enables the LAM to be trained with the backbone network synchronously. Therefore, the training process is simplified effectively. Experiments on CUHK03, Market-1501 and DukeMTMC-ReID datasets demonstrate the advantage of the proposed method.