DcDsDiff: Dual-Conditional and Dual-Stream Diffusion Model for Generative Image Tampering Localization
DcDsDiff: Dual-Conditional and Dual-Stream Diffusion Model for Generative Image Tampering Localization
Qixian Hao, Shaozhang Niu, Jiwei Zhang, Kai Wang
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 1071-1079.
https://doi.org/10.24963/ijcai.2025/120
Generative Image Tampering (GIT), due to its high diversity and realism, poses a significant challenge to traditional image tampering localization techniques. Consequently, this paper introduces a denoising diffusion probabilistic model-based DcDsDiff, which comprises a Dual-View Conditional Network (DVCN) and a Dual-Stream Denoising Network (DSDN). DVCN provides clues about the tampered areas. It extracts tampering features in the high-frequency view and integrates them with spatial domain features using attention mechanisms. DSDN jointly generates mask image and detail image, enhancing the generalization capability of the model against new tampering forms through iterative denoising. A multi-stream interaction mechanism enables the two generative tasks to promote each other, prompting the model to generate localization results that are rich in detail and complete. Experiments show that DcDsDiff outperforms mainstream methods in accurate localization, generalization, extensibility, and robustness. Code page: https://github.com/QixianHao/DcDsDiff-and-GIT10K.
Keywords:
Computer Vision: CV: Segmentation, grouping and shape analysis
Computer Vision: CV: Multimodal learning
Computer Vision: CV: Recognition (object detection, categorization)
