Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification

Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification

Zhesong Yu, Xiaoshuo Xu, Xiaoou Chen, Deshun Yang

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 4846-4852. https://doi.org/10.24963/ijcai.2019/673

Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with different lengths into fixed-dimensional representations. Furthermore, a training scheme is designed to enhance the robustness of our model. Extensive experiments demonstrate that combined with these techniques, our approach is robust against musical variations existing in cover songs and outperforms state-of-the-art methods on several datasets with low time complexity.
Keywords:
Multidisciplinary Topics and Applications: Art and Music
Machine Learning: Deep Learning
Multidisciplinary Topics and Applications: Information Retrieval