Stabilizing Holistic Semantics in Diffusion Bridge for Image Inpainting

Jinjia Peng; Mengkai Li; Huibing Wang

doi:10.24963/ijcai.2025/196

Stabilizing Holistic Semantics in Diffusion Bridge for Image Inpainting

Jinjia Peng, Mengkai Li, Huibing Wang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 1756-1764. https://doi.org/10.24963/ijcai.2025/196

PDF BibTeX

Image inpainting aims to restore the original image from a damaged version. Recently, a special type of diffusion bridge model has achieved promising performance by directly mapping the degradation process and restoring corrupted images through the corresponding reverse process. However, due to the lack of explicit semantic priors during the denoising process, the inpainted results typically exhibit inferior context-stability and semantic consistency. To this end, this paper proposes a novel Global Structure-Guided Diffusion Bridge framework (GSGDiff), which incorporates an additional structure restorer to stabilize the generation of holistic semantics. Specifically, to acquire richer semantic structure priors, this paper proposes a posterior sampling approach that captures semantically global and consistent structures at each timestep, efficiently integrating them into the texture generation through the corresponding guidance module. Additionally, considering the characteristics of diffusion models with low denoising levels at larger timesteps, this paper proposes a semantic fusion schedule to avoid noise interference by reducing the weight of ineffective guided semantics in the early stages. By applying the proposed posterior sampling to the texture denoising process, GSGDiff can achieve more stable and superior inpainting results over competitive baselines. Experiments on Places2, Paris Street View and CelebA-HQ datasets validate the efficacy of the proposed method.

Keywords:

Computer Vision: CV: Multimodal learning

Computer Vision: CV: Representation learning