A De Novo Divide-and-Merge Paradigm for Acoustic Model Optimization in Automatic Speech Recognition

Conghui Tan; Di Jiang; Jinhua Peng; Xueyang Wu; Qian Xu; Qiang Yang

doi:10.24963/ijcai.2020/513

A De Novo Divide-and-Merge Paradigm for Acoustic Model Optimization in Automatic Speech Recognition

Conghui Tan, Di Jiang, Jinhua Peng, Xueyang Wu, Qian Xu, Qiang Yang

Short video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 3709-3715. https://doi.org/10.24963/ijcai.2020/513

PDF BibTeX

Due to the rising awareness of privacy protection and the voluminous scale of speech data, it is becoming infeasible for Automatic Speech Recognition (ASR) system developers to train the acoustic model with complete data as before. In this paper, we propose a novel Divide-and-Merge paradigm to solve salient problems plaguing the ASR field. In the Divide phase, multiple acoustic models are trained based upon different subsets of the complete speech data, while in the Merge phase two novel algorithms are utilized to generate a high-quality acoustic model based upon those trained on data subsets. We first propose the Genetic Merge Algorithm (GMA), which is a highly specialized algorithm for optimizing acoustic models but suffers from low efficiency. We further propose the SGD-Based Optimizational Merge Algorithm (SOMA), which effectively alleviates the efficiency bottleneck of GMA and maintains superior performance. Extensive experiments on public data show that the proposed methods can significantly outperform the state-of-the-art.

Keywords:

Natural Language Processing: Speech