Hierarchical Semantic Contrast for Weakly Supervised Semantic Segmentation

Hierarchical Semantic Contrast for Weakly Supervised Semantic Segmentation

Yuanchen Wu, Xiaoqiang Li, Songmin Dai, Jide Li, Tong Liu, Shaorong Xie

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 1542-1550. https://doi.org/10.24963/ijcai.2023/171

Weakly supervised semantic segmentation (WSSS) with image-level annotations has achieved great processes through class activation map (CAM). Since vanilla CAMs are hardly served as guidance to bridge the gap between full and weak supervision, recent studies explore semantic representations to make CAM fit for WSSS and demonstrate encouraging results. However, they generally exploit single-level semantics, which may hamper the model to learn a comprehensive semantic structure. Motivated by the prior that each image has multiple levels of semantics, we propose hierarchical semantic contrast (HSC) to ameliorate the above problem. It conducts semantic contrast from coarse-grained to fine-grained perspective, including ROI level, class level, and pixel level, making the model learn a better object pattern understanding. To further improve CAM quality, building upon HSC, we explore consistency regularization of cross supervision and develop momentum prototype learning to utilize abundant semantics across different images. Extensive studies manifest that our plug-and-play learning paradigm, HSC, can significantly boost CAM quality on both non-saliency-guided and saliency-guided baselines, and establish new state-of-the-art WSSS performance on PASCAL VOC 2012 dataset. Code is available at https://github.com/Wu0409/HSC_WSSS.
Keywords:
Computer Vision: CV: Segmentation
Computer Vision: CV: Representation learning
Computer Vision: CV: Scene analysis and understanding