Video anomaly detection is extensively utilized across a variety of domains including public transportation, industrial production, city management, and military fields to mitigate risks and bolster enhance safety. To tackle the challenges associated with video anomaly detection in intricate environments, we propose a light but efficient framework that builds upon future frame prediction techniques. Our framework incorporates Convolutional Long Short-Term Memory (ConvLSTM), masked convolution, and attention mechanisms to enhance the detection accuracy. Furthermore, to simplify the model's complexity, we replace the convolutional layers in the network with depthwise separable convolutions (DSC). Through evaluation on public datasets such as CUHK Avenue, UCSD Peds1, and UCSD Peds2, our proposed network model exhibits both high accuracy and real-time performance.
|