Drafting and Revision: Advancing High-Fidelity Video Inpainting
Drafting and Revision: Advancing High-Fidelity Video Inpainting
Zhiliang Wu, Kun Li, Hehe Fan, Yi Yang
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 2063-2071.
https://doi.org/10.24963/ijcai.2025/230
Video inpainting aims to fill the missing regions in video with spatial-temporally coherent contents. Existing methods usually treat the missing contents as a whole and adopt a hybrid objective containing a reconstruction loss and an adversarial loss to train the model. However, these two kinds of loss focus on contents at different frequencies, simply combining them may cause inter-frequency conflicts, leading the trained model to generate compromised results. Inspired by the common corrupted painting restoration process of “drawing a draft first and then revising the details later”, this paper proposes a Drafting-and-Revision Completion Network (DRCN) for video inpainting. Specifically, we first design a Drafting Network that utilizes the temporal information to complete the low-frequency semantic structure at low resolution. Then, a Revision Network is developed to hallucinate high-frequency details at high resolution by using the output of Drafting Network. In this way, adversarial loss and reconstruction loss can be applied to high-frequency and low-frequency respectively, effectively mitigating inter-frequency conflicts. Furthermore, Revision Network can be stacked in a pyramid manner to generate higher resolution details, which provide a feasible solution for high-resolution video inpainting. Experiments show that DRCN achieves improvements of 7.43% and 12.64% in E_warp and LPIPS, and can handle higher resolution videos on limited GPU memory.
Keywords:
Computer Vision: CV: Image and video synthesis and generation
