Hallucination-Aware Prompt Optimization for Text-to-Video Synthesis
Hallucination-Aware Prompt Optimization for Text-to-Video Synthesis
Jiapeng Wang, Chengyu Wang, Jun Huang, Lianwen Jin
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
AI, Arts & Creativity. Pages 10198-10206.
https://doi.org/10.24963/ijcai.2025/1133
The rapid advancements in AI-generated content (AIGC) have led to extensive research and application of deep text-to-video (T2V) synthesis models, such as OpenAI's Sora. These models typically rely on high-quality prompt-video pairs and detailed text prompts for model training in order to produce high-quality videos. To boost the effectiveness of Sora-like T2V models, we introduce VidPrompter, an innovative large multi-modal model supporting T2V applications with three key functionalities: (1) generating detailed prompts from raw videos, (2) enhancing prompts from videos grounded with short descriptions, and (3) refining simple user-provided prompts to elevate T2V video quality. We train VidPrompter using a hybrid multi-task paradigm and propose the hallucination-aware direct preference optimization (HDPO) technique to improve the multi-modal, multi-task prompt optimization process. Experiments on various tasks show our method surpasses strong baselines and other competitors.
Keywords:
Methods and resources: Machine learning, deep learning, neural models, reinforcement learning
Methods and resources: AI systems for collaboration and co-creation
Application domains: Images, movies and visual arts
