Multi-Grained Role Labeling Based on Multi-Modality Information for Real Customer Service Telephone Conversation / 1816
Weizhi Ma, Min Zhang, Yiqun Liu, Shaoping Ma
Large-scale customer service call records include lots of valuable information for business intelligence. However, the analysis of those records has not utilized in the big data era before. There are two fundamental problems before mining and analyses: 1) The telephone conversation is mixed with words of agents and users which have to be recognized before analysis; 2) The speakers in conversation are not in a pre-defined set. These problems are new challenges which have not been well studied in the previous work. In this paper, we propose a four-phase framework for role labeling in real customer service telephone conversation, with the benefit of integrating multi-modality features, i.e., both low-level acoustic features and semantic-level textual features. Firstly, we conduct ΔBayesian Information Criterion (ΔBIC) based speaker diarization to get two segments clusters from an audio stream. Secondly, the segments are transferred into text in an Automatic Speech Recognition (ASR) phase with a deep learning model DNN-HMM. Thirdly, by integrating acoustic and textual features, dialog level role labeling is proposed to map the two clusters into the agent and the user. Finally, sentence level role correction is designed in order to label results correctly in a fine-grained notion, which reduces the errors made in previous phases. The proposed framework is tested on two real datasets: mobile and bank customer service calls datasets. The precision of dialog level labeling is over 99.0%. On the sentence level, the accuracy of labeling reaches 90.4%, greatly outperforming traditional acoustic features based method which achieves only 78.5% in accuracy.