Multi-Stream Deep Similarity Learning Networks for Visual Tracking

Multi-Stream Deep Similarity Learning Networks for Visual Tracking

Kunpeng Li, Yu Kong, Yun Fu

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 2166-2172. https://doi.org/10.24963/ijcai.2017/301

Visual tracking has achieved remarkable success in recent decades, but it remains a challenging problem due to appearance variations over time and complex cluttered background. In this paper, we adopt a tracking-by-verification scheme to overcome these challenges by determining the patch in the subsequent frame that is most similar to the target template and distinctive to the background context. A multi-stream deep similarity learning network is proposed to learn the similarity comparison model. The loss function of our network encourages the distance between a positive patch in the search region and the target template to be smaller than that between positive patch and the background patches. Within the learned feature space, even if the distance between positive patches becomes large caused by the appearance change or interference of background clutter, our method can use the relative distance to distinguish the target robustly. Besides, the learned model is directly used for tracking with no need of model updating, parameter fine-tuning and can run at 45 fps on a single GPU. Our tracker achieves state-of-the-art performance on the visual tracking benchmark compared with other recent real-time-speed trackers, and shows better capability in handling background clutter, occlusion and appearance change.
Keywords:
Machine Learning: Machine Learning
Machine Learning: Deep Learning
Robotics and Vision: Robotics and Vision