Abstract:
Objectives Video surveillance system plays a vital role in the field of security and supervision. How to automatically and accurately identify abnormal pedestrian behaviors or events with potential security threats from videos without human intervention, and reduce the pressure of manual review of a large number of video surveillance images, is one of the current research hotspots in the field of computer vision. In recent years, the rapid development of artificial intelligence technology has greatly improved video anomaly detection technology. However, there are still challenges in distinguishing subtle differences between abnormal and normal behavior in changing and diverse environments.
Methods We construct a new video pedestrian anomaly detection model based on bidirectional prediction generative adversarial network (BiP-GAN). The model mainly includes CCA-U-Net generator and Globle-Patch discriminator. The advantages of optical flow model in capturing optical flow changes and image sequence motion characteristics are used to calculate the loss function of the generator and discriminator. Based on the classic U-Net module, the criss-cross attention (CCA)-U-Net generator introduces CCA module to enhance the recognition ability of the model for the key features of video behavior. Globle-patch discriminator combines the global and local feature perception advantages of Globle discriminator and Patch discriminator, improves the global and local feature perception ability of the model, and the robustness and accuracy of the model. The pre-training strategy of BiP-GAN adopts the bidirectional prediction mode of the first 4 frames of forward prediction and the last 4 frames of reverse prediction, so that the model can better combine the context features of the image sequence and generate prediction frames with better image quality. In addition, BiP-GAN uses a learning rate decay method combining Warm-up and cosine annealing function (CAF) to speed up the model to find the global optimal solution, thus saving computing resources.
Results BiP-GAN is verified and analyzed by using the public datasets CUHK Avenue, UCSD ped2 and ShanghaiTech. The average area under the curve of BiP-GAN is 87.3, 96.2 and 73.9, respectively. All of them are higher than the existing classic models (such as Ada-GAN, Con-GAN, Mul-GAN). Ablation experiments show the effectiveness of the CCA-U-Net generator, Globle-Patch discriminator, bidirectional prediction strategy, and the learning rate decay method combining Warm-up and CAF for the model.
Conclusions The proposed BiP-GAN model effectively enhances the accuracy and robustness of video anomaly detection through bidirectional prediction, attention mechanisms, multi-scale discrimination, and an optimized training strategy. Experimental results demonstrate its superiority over existing models, confirming its potential for practical application in intelligent surveillance systems.