A Sketch-Transformer Network for Face Photo-Sketch Synthesis

Mingrui Zhu; Changcheng Liang; Nannan Wang; Xiaoyu Wang; Zhifeng Li; Xinbo Gao

doi:10.24963/ijcai.2021/187

A Sketch-Transformer Network for Face Photo-Sketch Synthesis

Mingrui Zhu, Changcheng Liang, Nannan Wang, Xiaoyu Wang, Zhifeng Li, Xinbo Gao

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

Main Track. Pages 1352-1358. https://doi.org/10.24963/ijcai.2021/187

PDF BibTeX

We present a face photo-sketch synthesis model, which converts a face photo into an artistic face sketch or recover a photo-realistic facial image from a sketch portrait. Recent progress has been made by convolutional neural networks (CNNs) and generative adversarial networks (GANs), so that promising results can be obtained through real-time end-to-end architectures. However, convolutional architectures tend to focus on local information and neglect long-range spatial dependency, which limits the ability of existing approaches in keeping global structural information. In this paper, we propose a Sketch-Transformer network for face photo-sketch synthesis, which consists of three closely-related modules, including a multi-scale feature and position encoder for patch-level feature and position embedding, a self-attention module for capturing long-range spatial dependency, and a multi-scale spatially-adaptive de-normalization decoder for image reconstruction. Such a design enables the model to generate reasonable detail texture while maintaining global structural information. Extensive experiments show that the proposed method achieves significant improvements over state-of-the-art approaches on both quantitative and qualitative evaluations.

Keywords:

Computer Vision: 2D and 3D Computer Vision

Computer Vision: Biometrics, Face and Gesture Recognition