Listen, Think and Listen Again: Capturing Top-down Auditory Attention for Speaker-independent Speech Separation

Listen, Think and Listen Again: Capturing Top-down Auditory Attention for Speaker-independent Speech Separation

Jing Shi, Jiaming Xu, Guangcan Liu, Bo Xu

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 4353-4360. https://doi.org/10.24963/ijcai.2018/605

Recent deep learning methods have made significant progress in multi-talker mixed speech separation. However, most existing models adopt a driftless strategy to separate all the speech channels rather than selectively attend the target one. As a result, those frameworks may be failed to offer a satisfactory solution in complex auditory scene where the number of input sounds is usually uncertain and even dynamic. In this paper, we present a novel neural network based structure motivated by the top-down attention behavior of human when facing complicated acoustical scene. Different from previous works, our method constructs an inference-attention structure to predict interested candidates and extract each speech channel of them. Our work gets rid of the limitation that the number of channels must be given or the high computation complexity for label permutation problem. We evaluated our model on the WSJ0 mixed-speech tasks. In all the experiments, our model gets highly competitive to reach and even outperform the baselines.
Keywords:
Natural Language Processing: Speech
Machine Learning: Neural Networks
Machine Learning: Deep Learning