Intoner: For Chinese Poetry Intoning Synthesis

Intoner: For Chinese Poetry Intoning Synthesis

Heda Zuo, Liyao Sun, Zeyu Lai, Weitao You, Pei Chen, Lingyun Sun

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
AI, Arts & Creativity. Pages 10252-10260. https://doi.org/10.24963/ijcai.2025/1139

Chinese Poetry Intoning, with improvised melodies devoid of fixed musical scores, is crucial for emotional expression and prosodic rendition. However, this cultural heritage faces challenges in propagation due to scant audio records and a scarcity of domain experts. Existing text-to-speech models lack the ability to generate melodious audio, while singing-voice-synthesis models rely on predetermined musical scores, which are all unsuitable for intoning synthesis. Hence, we introduce Chinese Poetry Intoning Synthesis (PIS) as a novel task to reproduce intoning audio and preserve this age-old cultural art. Corresponding to this task, we summarize three-level principles from poetry metrical patterns and construct a diffusion PIS model Intoner based on them. We also collect a multi-style Chinese poetry intoning dataset of text-audio pairs accompanied by feature annotations. Experimental results show that our model effectively learns diverse intoning styles and contents which can synthesize more melodious and vibrant intoning audio. To the best of our knowledge, we are the first to work on poetry intoning synthesis task.
Keywords:
Application domains: Music and sound
Application domains: Other domains of art or creativity
Methods and resources: Machine learning, deep learning, neural models, reinforcement learning