Gated POS-Level Language Model for Authorship Verification

Gated POS-Level Language Model for Authorship Verification

Linshu Ouyang, Yongzheng Zhang, Hui Liu, Yige Chen, Yipeng Wang

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 4025-4031. https://doi.org/10.24963/ijcai.2020/557

Authorship verification is an important problem that has many applications. The state-of-the-art deep authorship verification methods typically leverage character-level language models to encode author-specific writing styles. However, they often fail to capture syntactic level patterns, leading to sub-optimal accuracy in cross-topic scenarios. Also, due to imperfect cross-author parameter sharing, it's difficult for them to distinguish author-specific writing style from common patterns, leading to data-inefficient learning. This paper introduces a novel POS-level (Part of Speech) gated RNN based language model to effectively learn the author-specific syntactic styles. The author-agnostic syntactic information obtained from the POS tagger pre-trained on large external datasets greatly reduces the number of effective parameters of our model, enabling the model to learn accurate author-specific syntactic styles with limited training data. We also utilize a gated architecture to learn the common syntactic writing styles with a small set of shared parameters and let the author-specific parameters focus on each author's special syntactic styles. Extensive experimental results show that our method achieves significantly better accuracy than state-of-the-art competing methods, especially in cross-topic scenarios (over 5\% in terms of AUC-ROC).
Keywords:
Natural Language Processing: Natural Language Processing
Natural Language Processing: NLP Applications and Tools
Data Mining: Mining Text, Web, Social Media
Natural Language Processing: Other