ZHANG Jie, YANG Xue, GONG Zhilong, GUAN Qingfeng. Bidirectional Prediction BiP-GAN Pedestrian Video Anomaly Event Automatic Detection[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20240259
Citation:
ZHANG Jie, YANG Xue, GONG Zhilong, GUAN Qingfeng. Bidirectional Prediction BiP-GAN Pedestrian Video Anomaly Event Automatic Detection[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20240259
ZHANG Jie, YANG Xue, GONG Zhilong, GUAN Qingfeng. Bidirectional Prediction BiP-GAN Pedestrian Video Anomaly Event Automatic Detection[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20240259
Citation:
ZHANG Jie, YANG Xue, GONG Zhilong, GUAN Qingfeng. Bidirectional Prediction BiP-GAN Pedestrian Video Anomaly Event Automatic Detection[J]. Geomatics and Information Science of Wuhan University. DOI: 10.13203/j.whugis20240259
Video surveillance system plays a vital role in the field of safety and supervision. How to automatically and accurately identify abnormal behaviors or events with potential security threats from video without human intervention and reduce the pressure of manual review of a large number of video surveillance images is one of the research hotspots in the field of computer vision at present. In recent years, the rapid development of artificial intelligence technology has greatly improved the video anomaly detection technology, but there are still challenges to distinguish the subtle differences between abnormal and normal behaviors in changing and diverse environments. In this paper, a new Bidirectional prediction GAN (BiP-GAN) video pedestrian anomaly detection model is constructed. The model mainly includes CCA-UNet generator and Globle-Patch discriminator. The optical flow model is used to calculate the loss function of the generator and discriminator by taking advantage of the optical flow change and image sequence motion characteristics. Cca-unet generator is based on the classic U-Net module and introduces CrossCirssAttention (CCA) module to enhance the model's ability to recognize key features of video behavior. By combining the global and local feature sensing advantages of Globle discriminator and Patch discriminator, Globle-Patch improves the global and local feature sensing ability of the model, and improves the robustness and accuracy of the model. The BiP-GAN pre-training strategy adopts the bidirectional prediction mode of forward prediction for the first 4 frames and reverse prediction for the last 4 frames, so that the model can better combine the context features of the image sequence to generate prediction frames with better image quality. In addition, BiP-GAN adopts a learning rate attenuation method combining Warm-up and CAF (Cosine Annealing learning rate Function) to speed up the model's search for the global optimal solution, thus saving computational resources. Public data sets CUHK Avenue, UCSD ped2 and ShanghaiTech were used to verify and analyze BiP-GAN. The average AUC values of BIP-GAN were 87.3, 96.2 and 73.9, which were all higher than the baseline model (e.g., AdaGAN; Con-GAN; Mu-gan). Ablation experiments show the effectiveness of the CCA-UNet generator, Globle Patch discriminator, bidirectional prediction strategy and the learning rate attenuation method combined with warm-up and CAF.