Multi-task Layout Analysis for Historical Handwritten Documents Using Fully Convolutional Networks

Multi-task Layout Analysis for Historical Handwritten Documents Using Fully Convolutional Networks

Yue Xu, Fei Yin, Zhaoxiang Zhang, Cheng-Lin Liu

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 1057-1063. https://doi.org/10.24963/ijcai.2018/147

Layout analysis is a fundamental process in document image analysis and understanding. It consists of several sub-processes such as page segmentation, text line segmentation, baseline detection and so on. In this work, we propose a multi-task layout analysis method that use a single FCN model to solve the above three problems simultaneously. The FCN is trained to segment the document image into different regions and detect the center line of each text line by classifying pixels into different categories. By supervised learning on document images with pixel-wise labels, the FCN can extract discriminative features and perform pixel-wise classification accurately. After pixel-wise classification, post-processing steps are taken to reduce noises, correct wrong segmentations and find out overlapping regions. Experimental results on the public dataset DIVA-HisDB containing challenging medieval manuscripts demonstrate the effectiveness and superiority of the proposed method.
Keywords:
Computer Vision: Language and Vision
Computer Vision: Computer Vision