利用人脸3DMM重构信息检测深度伪脸视频

胡永健; 佘惠敏; 刘琲贝; 陈香全; 刘光尧

doi:10.13203/j.whugis20210427

利用人脸3DMM重构信息检测深度伪脸视频

Deepfake Video Detection Using 3DMM Facial Reconstruction Information

摘要

摘要: 提出一种基于人脸三维形变模型(3D morphable model, 3DMM)的深度伪脸视频检测算法,利用3DMM强大的人脸形状、纹理、表情和姿态参数估算能力,逐帧获取鉴伪基本信息。设计面部行为特征计算模块和静态外貌特征提取模块,以滑动窗为单位,在时间轴上分别从表情和姿态参数提取人物的面部行为特征,从形状和纹理参数计算人物的静态外貌特征。鉴伪过程利用人物外貌特征与面部行为特征的一致性来完成。所提出的算法人物针对性强,可解释性好。该方法与同类算法相比,半总错误率更低,抗视频压缩能力更好,计算更加简便。

Abstract:
Objectives The emergence of deepfake technique leads to a worldwide information security problem. Deepfake videos are used to manipulate and mislead the public. Though there have been a variety of deepfake detection methods, the features extracted generally suffer from poor interpretability. To solve this problem, a deepfake video detection method using 3D morphable model (3DMM) of face is proposed.
Methods The 3DMM is employed to estimate parameters of shape, texture, expression, and gesture of the face frame by frame, constituting the basic information of deepfake detection. The facial behavior feature extraction module and the static face appearance feature extraction module are designed for the construction of feature vectors on a sliding window basis. The facial behavior feature vector is derived from the expression and gesture parameters while the appearance feature vector is calculated with the shape and texture parameters. The consistency measured by cosine distance between the appearance feature vector and the behavior feature vector is the criterion for authentication of the face for each sliding window across the video.
Results The effectiveness of the proposed method is evaluated with three public datasets. The overall half total error rates (HTER) obtained on FF++, DFD and Celeb-DF dataset are 1.33%, 4.93% and 3.92% respectively. For the severely compressed videos, C40 of DFD, the HTER is 7.09%, showing a good robustness against video compression. The model complexity is around 1/4 of that of the most related work.
Conclusions The proposed algorithm has good person pertinence and clear interpret‍ability. Compared with state-of-the-art methods in literature, the proposed algorithm demonstrates lower half total error rates, better resistance to video compression and less computational cost.

HTML全文

参考文献(21)

施引文献

资源附件(0)