Scene Text Detection in Video by Learning Locally and Globally / 2647
Shu Tian, Wei-Yi Pei, Ze-Yu Zuo, Xu-Cheng Yin
There are a variety of grand challenges for text extraction in scene videos by robots and users, e.g., heterogeneous background, varied text, nonuniform illumination, arbitrary motion and poor contrast. Most previous video text detection methods are investigated with local information, i.e., within individual frames, with limited performance. In this paper, we propose a unified tracking based text detection system by learning locally and globally, which uniformly integrates detection, tracking, recognition and their interactions. In this system, scene text is first detected locally in individual frames. Second, an optimal tracking trajectory is learned and linked globally with all detection, recognition and prediction information by dynamic programming. With the tracking trajectory, final detection and tracking results are simultaneously and immediately obtained. Moreover, our proposed techniques are extensively evaluated on several public scene video text databases, and are much better than the state-of-the-art methods.