Enhancing Nighttime Semantic Segmentation with Visual-Linguistic Priors and Wavelet Transform

Jianhou Zhou; Xiaolong Zhou; Sixian Chan; Zhaomin Chen; Xiaoqin Zhang

doi:10.24963/ijcai.2025/888

Enhancing Nighttime Semantic Segmentation with Visual-Linguistic Priors and Wavelet Transform

Jianhou Zhou, Xiaolong Zhou, Sixian Chan, Zhaomin Chen, Xiaoqin Zhang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 7985-7993. https://doi.org/10.24963/ijcai.2025/888

PDF BibTeX

Nighttime semantic segmentation is a critical yet challenging task in autonomous driving. Most existing methods are designed for daytime scenarios, resulting in poor nighttime performance due to texture loss and decreased object visibility. Low-light enhancement was applied before segmentation but failed to recover nighttime-specific details, introducing noise or losing delicate structures. Recent work shows that large-scale image-text pairs can effectively leverage natural language priors to guide visual representation, achieving remarkable performance across various downstream visual tasks. However, effectively employing visual-linguistic priors for nighttime semantic segmentation remains underexplored. To address these issues, we propose Text-WaveletFormer, a novel end-to-end framework that integrates text prompts and wavelet-based texture enhancement. Specifically, to compensate for the low recognizability of objects in nighttime scenes, we design a Text-Image Fusion Module (TIFM) to incorporate textual priors to improve nighttime object recognition. In addition, to alleviate the lack of texture details in nighttime conditions, we introduce a Wavelet Guided Texture Amplifier Module (WTAM) to fuse wavelet and raw image features via cross-attention, restoring low-light details. Finally, extensive experiments on benchmarks including NightCity, NightCity-fine, BDD100K, and CityScapes demonstrate our method’s superior performance over existing approaches.

Keywords:

Multidisciplinary Topics and Applications: MTA: Computer games

Computer Vision: CV: Segmentation, grouping and shape analysis

Knowledge Representation and Reasoning: KRR: Learning and reasoning

Natural Language Processing: NLP: Information extraction