We propose a parallel network with spatial–temporal attention for video-based person re-identification. Many previous video-based person re-identification methods use two-dimensional convolutional neural networks to extract spatial features, then, temporal features are extracted by temporal pooling or recurrent neural networks. Unfortunately, these series networks will cause the loss of spatial information when extracting temporal information. Different from previous methods, our parallel network can extract temporal and spatial features simultaneously, which can effectively reduce the loss of space information. In addition, we design a global temporal attention module, which obtains the attention weight through the correlation between the current frame and all the frames in the sequence. At the same time, the temporal module can act on the information extraction of spatial module. In this way, we can increase the temporal and spatial constraints. Experiments show that our method can effectively improve the re-id accuracy, better than the state-of-the-art methods.
You have requested a machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Neither SPIE nor the owners and publishers of the content make, and they explicitly disclaim, any express or implied representations or warranties of any kind, including, without limitation, representations and warranties as to the functionality of the translation feature or the accuracy or completeness of the translations.
Translations are not retained in our system. Your use of this feature and the translations is subject to all use restrictions contained in the Terms and Conditions of Use of the SPIE website.