Reinforcing Coherence for Sequence to Sequence Model in Dialogue Generation

Reinforcing Coherence for Sequence to Sequence Model in Dialogue Generation

Hainan Zhang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 4567-4573. https://doi.org/10.24963/ijcai.2018/635

Sequence to sequence (Seq2Seq) approach has gained great attention in the field of single-turn dialogue generation. However, one serious problem is that most existing Seq2Seq based models tend to generate common responses lacking specific meanings. Our analysis show that the underlying reason is that Seq2Seq is equivalent to optimizing Kullback–Leibler (KL) divergence, thus does not penalize the case whose generated probability is high while the true probability is low. However, the true probability is unknown, which poses challenges for tackling this problem. Inspired by the fact that the coherence (i.e. similarity) between post and response is consistent with human evaluation, we hypothesize that the true probability of a response is proportional to the coherence degree. The coherence scores are then used as the reward function in a reinforcement learning framework to penalize the case whose generated probability is high while the true probability is low. Three different types of coherence models, including an unlearned similarity function, a pretrained semantic matching function, and an end-to-end dual learning architecture, are proposed in this paper. Experimental results on both Chinese Weibo dataset and English Subtitle dataset show that the proposed models produce more specific and meaningful responses, yielding better performances against Seq2Seq models in terms of both metric-based and human evaluations.
Keywords:
Machine Learning: Neural Networks
Natural Language Processing: Dialogue
Natural Language Processing: Natural Language Generation
Machine Learning: Deep Learning
Machine Learning: Learning Generative Models