Generating Robust Audio Adversarial Examples with Temporal Dependency

Hongting Zhang; Pan Zhou; Qiben Yan; Xiao-Yang Liu

doi:10.24963/ijcai.2020/438

Generating Robust Audio Adversarial Examples with Temporal Dependency

Hongting Zhang, Pan Zhou, Qiben Yan, Xiao-Yang Liu

Short video

Long video

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Main track. Pages 3167-3173. https://doi.org/10.24963/ijcai.2020/438

PDF BibTeX

Audio adversarial examples, imperceptible to humans, have been constructed to attack automatic speech recognition (ASR) systems. However, the adversarial examples generated by existing approaches usually incorporate noticeable noises, especially during the periods of silences and pauses. Moreover, the added noises often break temporal dependency property of the original audio, which can be easily detected by state-of-the-art defense mechanisms. In this paper, we propose a new Iterative Proportional Clipping (IPC) algorithm that preserves temporal dependency in audios for generating more robust adversarial examples. We are motivated by an observation that the temporal dependency in audios imposes a significant effect on human perception. Following our observation, we leverage a proportional clipping strategy to reduce noise during the low-intensity periods. Experimental results and user study both suggest that the generated adversarial examples can significantly reduce human-perceptible noises and resist the defenses based on the temporal structure.

Keywords:

Machine Learning: Adversarial Machine Learning

Natural Language Processing: Speech