Squeezing Context into Patches: Towards Memory-Efficient Ultra-High Resolution Semantic Segmentation

Squeezing Context into Patches: Towards Memory-Efficient Ultra-High Resolution Semantic Segmentation

Wang Liu, Puhong Duan, Xudong Kang, Shutao Li

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 1603-1611. https://doi.org/10.24963/ijcai.2025/179

Segmenting ultra-high-resolution (UHR) images poses a significant challenge due to constraints on GPU memory, leading to a trade-off between detailed local information and a comprehensive contextual understanding. Current UHR methods often employ a multi-branch encoder to handle local and contextual information, which can be memory-intensive. To address the need for both high accuracy and low memory usage in processing UHR images, we introduce a memory-efficient semantic segmentation approach by squeezing context information into local patches (SCPSeg). Our method integrates the processing of local and contextual information within a single-branch encoder. Specifically, we introduce a context squeezing module (CSM) designed to compress global context details into local patches, enabling segmentation networks to perceive broader image contexts. Additionally, we propose a super-resolution guided local feature alignment (LFA) technique to improve segmentation precision by aligning local feature relationships. This approach calculates similarities within sliding windows, avoiding heavy computational costs during the training phase. We evaluate the effectiveness of our proposed method on four widely used UHR segmentation benchmarks. Experimental results demonstrate that our approach enhances UHR segmentation accuracy without incurring additional memory overhead during the inference stage. The code is available at https://github.com/StuLiu/SCPSeg.
Keywords:
Computer Vision: CV: Segmentation, grouping and shape analysis