Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, Dongmei Zhang

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Survey Track. Pages 5426-5435. https://doi.org/10.24963/ijcai.2022/761

Following the success of pre-training techniques in the natural language domain, a flurry of table pre-training frameworks have been proposed and have achieved new state-of-the-arts on various downstream tasks such as table question answering, table type recognition, column relation classification, table search, and formula prediction. Various model architectures have been explored to best capture the characteristics of (semi-)structured tables, especially specially-designed attention mechanisms. Moreover, to fully leverage the supervision signals in unlabeled tables, diverse pre-training objectives have been designed and evaluated, for example, denoising cell values, predicting numerical relationships, and learning a neural SQL executor. This survey aims to provide a comprehensive review of model designs, pre-training objectives, and downstream tasks for table pre-training, and we further share our thoughts on existing challenges and future opportunities.
Keywords:
Survey Track: Knowledge Representation and Reasoning
Survey Track: Natural Language Processing
Survey Track: Data Mining