利用BiP-GAN进行行人视频异常事件自动检测

张杰; 杨雪; 龚智龙; 关庆锋

doi:10.13203/j.whugis20240259

摘要: 视频监控系统在安全和监督领域扮演着至关重要的角色，如何在不需要人为干预的情况下从视频中自动精准识别具有潜在安全威胁的行人非正常行为或事件，减少对大量视频监控画面进行人工审查的压力，是目前计算机视觉领域的研究热点之一。近年来，人工智能技术的快速发展使得视频异常检测技术得到了大幅提升，但多变、多样环境下异常与正常行为的细微差异区分还存在挑战。构建了一种新的双向预测生成对抗网络（bidirectional prediction generative adversarial network，BiP-GAN）视频行人异常检测模型。该模型主要包括交叉循环注意力（criss-cross attention，CCA）-U-Net生成器和Globle-Patch判别器，利用光流模型在光流变化及图像序列运动特征上的捕获优势，将其用于生成器和判别器的损失函数计算。CCA-U-Net生成器以经典U-Net模块为基础，通过CCA模块增强模型对视频行为关键特征的识别能力。Globle-Patch判别器通过结合Globle判别器和Patch判别器在全局和局部特征的感受优势，提高模型全局及局部的特征感受能力，提高模型的鲁棒性和准确性。BiP-GAN的预训练策略采用前4帧正向预测和后4帧反向预测的双向预测模式，使模型更好地结合图像序列的上下文特征，生成图像质量更好的预测帧。另外，BiP-GAN采用Warm-up与余弦退火学习率函数（cosine annealing function，CAF）相结合的学习率衰减方法，加快模型寻找全局最优解，从而节省计算资源。实验利用公开数据集CUHK Avenue、UCSD ped2和ShanghaiTech对BiP-GAN进行了验证和分析，其曲线下面积的平均值分别为87.3、96.2、73.9，均高于已有经典模型（如Ada-GAN、Con-GAN、Mul-GAN）。消融实验表明了CCA-U-Net生成器、Globle-Patch判别器、双向预测策略以及Warm-up与CAF结合的学习率衰减方法对于模型的有效性。

Abstract:

Objectives Video surveillance system plays a vital role in the field of security and supervision. How to automatically and accurately identify abnormal pedestrian behaviors or events with potential security threats from videos without human intervention, and reduce the pressure of manual review of a large number of video surveillance images, is one of the current research hotspots in the field of computer vision. In recent years, the rapid development of artificial intelligence technology has greatly improved video anomaly detection technology. However, there are still challenges in distinguishing subtle differences between abnormal and normal behavior in changing and diverse environments.

Methods We construct a new video pedestrian anomaly detection model based on bidirectional prediction generative adversarial network (BiP-GAN). The model mainly includes CCA-U-Net generator and Globle-Patch discriminator. The advantages of optical flow model in capturing optical flow changes and image sequence motion characteristics are used to calculate the loss function of the generator and discriminator. Based on the classic U-Net module, the criss-cross attention (CCA)-U-Net generator introduces CCA module to enhance the recognition ability of the model for the key features of video behavior. Globle-patch discriminator combines the global and local feature perception advantages of Globle discriminator and Patch discriminator, improves the global and local feature perception ability of the model, and the robustness and accuracy of the model. The pre-training strategy of BiP-GAN adopts the bidirectional prediction mode of the first 4 frames of forward prediction and the last 4 frames of reverse prediction, so that the model can better combine the context features of the image sequence and generate prediction frames with better image quality. In addition, BiP-GAN uses a learning rate decay method combining Warm-up and cosine annealing function (CAF) to speed up the model to find the global optimal solution, thus saving computing resources.

Results BiP-GAN is verified and analyzed by using the public datasets CUHK Avenue, UCSD ped2 and ShanghaiTech. The average area under the curve of BiP-GAN is 87.3, 96.2 and 73.9, respectively. All of them are higher than the existing classic models (such as Ada-GAN, Con-GAN, Mul-GAN). Ablation experiments show the effectiveness of the CCA-U-Net generator, Globle-Patch discriminator, bidirectional prediction strategy, and the learning rate decay method combining Warm-up and CAF for the model.

Conclusions The proposed BiP-GAN model effectively enhances the accuracy and robustness of video anomaly detection through bidirectional prediction, attention mechanisms, multi-scale discrimination, and an optimized training strategy. Experimental results demonstrate its superiority over existing models, confirming its potential for practical application in intelligent surveillance systems.

利用BiP-GAN进行行人视频异常事件自动检测

BiP-GAN Pedestrian Video Anomaly Event Automatic Detection