Learning Attention from Attention: Efficient Self-Refinement Transformer for Face Super-Resolution

Learning Attention from Attention: Efficient Self-Refinement Transformer for Face Super-Resolution

Guanxin Li, Jingang Shi, Yuan Zong, Fei Wang, Tian Wang, Yihong Gong

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 1035-1043. https://doi.org/10.24963/ijcai.2023/115

Recently, Transformer-based architecture has been introduced into face super-resolution task due to its advantage in capturing long-range dependencies. However, these approaches tend to integrate global information in a large searching region, which neglect to focus on the most relevant information and induce blurry effect by the irrelevant textures. Some improved methods simply constrain self-attention in a local window to suppress the useless information. But it also limits the capability of recovering high-frequency details when flat areas dominate the local searching window. To improve the above issues, we propose a novel self-refinement mechanism which could adaptively achieve texture-aware reconstruction in a coarse-to-fine procedure. Generally, the primary self-attention is first conducted to reconstruct the coarse-grained textures and detect the fine-grained regions required further compensation. Then, region selection attention is performed to refine the textures on these key regions. Since self-attention considers the channel information on tokens equally, we employ a dual-branch feature integration module to privilege the important channels in feature extraction. Furthermore, we design the wavelet fusion module which integrate shallow-layer structure and deep-layer detailed feature to recover realistic face images in frequency domain. Extensive experiments demonstrate the effectiveness on a variety of datasets.
Keywords:
Computer Vision: CV: Biometrics, face, gesture and pose recognition
Computer Vision: CV: Computational photography