On Efficient Transformer-Based Image Pre-training for Low-Level Vision

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 1089-1097. https://doi.org/10.24963/ijcai.2023/121

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based pre-training regimes that boost various low-level tasks. To comprehensively diagnose the influence of pre-training, we design a whole set of principled evaluation tools that uncover its effects on internal representations. The observations demonstrate that pre-training plays strikingly different roles in low-level tasks. For example, pre-training introduces more local information to intermediate layers in super-resolution (SR), yielding significant performance gains, while pre-training hardly affects internal feature representations in denoising, resulting in limited gains. Further, we explore different methods of pre-training, revealing that multi-related-task pre-training is more effective and data-efficient than other alternatives. Finally, we extend our study to varying data scales and model sizes, as well as comparisons between transformers and CNNs. Based on the study, we successfully develop state-of-the-art models for multiple low-level tasks.
Keywords:
Computer Vision: CV: Computational photography
Computer Vision: CV: Applications
Computer Vision: CV: Representation learning