Region-Aware Temporal Inconsistency Learning for DeepFake Video Detection

Zhihao Gu; Taiping Yao; Yang Chen; Ran Yi; Shouhong Ding; Lizhuang Ma

doi:10.24963/ijcai.2022/129

Region-Aware Temporal Inconsistency Learning for DeepFake Video Detection

Zhihao Gu, Taiping Yao, Yang Chen, Ran Yi, Shouhong Ding, Lizhuang Ma

Watch video

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

Main Track. Pages 920-926. https://doi.org/10.24963/ijcai.2022/129

PDF BibTeX

The rapid development of face forgery techniques has drawn growing attention due to security concerns. Existing deepfake video detection methods always attempt to capture the discriminative features by directly exploiting static temporal convolution to mine temporal inconsistency, without explicit exploration on the diverse temporal dynamics of different forged regions. To effectively and comprehensively capture the various inconsistency, in this paper, we propose a novel Region-Aware Temporal Filter (RATF) module which automatically generates corresponding temporal filters for different spatial regions. Specifically, we decouple the dynamic temporal kernel into a set of region-agnostic basic filters and region-sensitive aggregation weights. And different weights guide the corresponding regions to adaptively learn temporal inconsistency, which greatly enhances the overall representational ability. Moreover, to cover the long-term temporal dynamics, we divide the video into multiple snippets and propose a Cross-Snippet Attention (CSA) to promote the cross-snippet information interaction. Extensive experiments and visualizations on several benchmarks demonstrate the effectiveness of our method against state-of-the-art competitors.

Keywords:

Computer Vision: Biometrics, Face, Gesture and Pose Recognition