SPARC: An AI-Based Speech Processing and Real-Time Correction System
SPARC: An AI-Based Speech Processing and Real-Time Correction System
TingRay Chung, Pin-Yu Chen
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Demo Track. Pages 11017-11020.
https://doi.org/10.24963/ijcai.2025/1253
In the world of audio narration and video production, maintaining clear and accurate dialogue is crucial. However, most work done in dubbing mistakes is done in post-production which is often not applicable to live broadcasts. This project aims to develop a real-time voice correction system that automatically detects and corrects speech errors in near real-time while integrating the adjusted audio into ongoing conversations without disrupting the natural flow. This paper utilizes various AI tools like the Nous Hermes 2-Mistral 7B DPO large language model to first generate the reference script for Coqui's XTTS-V2 zero-shot text-to-speech voice cloning model. After the correction is generated, it goes through a series of filters to replace the mistake and seamlessly integrates it. The experiment's user survey demonstrates that the corrected audio is of high quality.
Keywords:
Humans and AI: HAI: Applications
Humans and AI: HAI: Intelligent user interfaces
