SQuAD-SRC: A Dataset for Multi-Accent Spoken Reading Comprehension

SQuAD-SRC: A Dataset for Multi-Accent Spoken Reading Comprehension

Yixuan Tang, Anthony K.H: Tung

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 5206-5214. https://doi.org/10.24963/ijcai.2023/578

Spoken Reading Comprehension (SRC) is a challenging problem in spoken natural language retrieval, which automatically extracts the answer from the text-form contents according to the audio-form question. However, the existing spoken question answering approaches are mainly based on synthetically generated audio-form data, which may be ineffectively applied for multi-accent spoken question answering directly in many real-world applications. In this paper, we construct a large-scale multi-accent human spoken dataset SQuAD-SRC, in order to study the problem of multi-accent spoken reading comprehension. We choose 24 native English speakers from six different countries with various English accents and construct audio-form questions to the correspondent text-form contents by the chosen speakers. The dataset consists of 98,169 spoken question answering pairs and 20,963 passages from the popular machine reading comprehension dataset SQuAD. We present a statistical analysis of our SQuAD-SRC dataset and conduct extensive experiments on it by comparing cascaded SRC approaches and the enhanced end-to-end ones. Moreover, we explore various adaption strategies to improve the SRC performance, especially for multi-accent spoken questions.
Keywords:
Natural Language Processing: NLP: Question answering
Natural Language Processing: NLP: Speech