Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking

Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking

Qiang Wang, Mengdan Zhang, Junliang Xing, Jin Gao, Weiming Hu, Steve Maybank

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 985-991. https://doi.org/10.24963/ijcai.2018/137

This work presents a novel end-to-end trainable CNN model for high performance visual object tracking. It learns both low-level fine-grained representations and a high-level semantic embedding space in a mutual reinforced way, and a multi-task learning strategy is proposed to perform the correlation analysis on representations from both levels. In particular, a fully convolutional encoder-decoder network is designed to reconstruct the original visual features from the semantic projections to preserve all the geometric information. Moreover, the correlation filter layer working on the fine-grained representations leverages a global context constraint for accurate object appearance modeling. The correlation filter in this layer is updated online efficiently without network fine-tuning. Therefore, the proposed tracker benefits from two complementary effects: the adaptability of the fine-grained correlation analysis and the generalization capability of the semantic embedding. Extensive experimental evaluations on four popular benchmarks demonstrate its state-of-the-art performance.
Keywords:
Computer Vision: Motion and Tracking
Computer Vision: Computer Vision