Template-based Uncertainty Multimodal Fusion Network for RGBT Tracking

Template-based Uncertainty Multimodal Fusion Network for RGBT Tracking

Zhaodong Ding, Chenglong Li, Shengqing Miao, Jin Tang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 909-917. https://doi.org/10.24963/ijcai.2025/102

RGBT tracking is to localize the predefined targets in video sequences by effectively leveraging the information from both visible light (RGB) and thermal infrared (TIR) modalities. However, the quality of different modalities changes dynamically in complex scenes, and effectively perceiving modal quality for multimodal fusion remains a significant challenge. To address this challenge, we propose to employ the reliability of initial template to explore the uncertainty across different modalities, and design a novel template-based uncertainty computation framework for robust multimodal fusion in RGBT tracking. In particular, we introduce an Uncertainty-aware Multimodal Fusion Module (UMFM), which constructs the uncertainty of each modality by leveraging the correlation between the template and search region in the Subjective Logic framework, aiming to achieve robust multimodal fusion. In addition, existing methods focus on dynamic template update while overlooking the potential role of a reliable initial template in the template updating process.To this end, we design a simple yet effective Contrastive Template Update Module (CTUM) to assess the reliability of the new template by comparing its quality with that of the initial template. Extensive experiments suggest that our method outperforms existing approaches on four RGBT tracking benchmarks.
Keywords:
Computer Vision: CV: Motion and tracking
Computer Vision: CV: Multimodal learning