Generative AI for Immersive Video: Recent Advances and Future Opportunities
Generative AI for Immersive Video: Recent Advances and Future Opportunities
Kaiyuan Hu, Yili Jin, Hao Zhou, Linfeng Du, Jiangchuan Liu, Xue Liu
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Survey Track. Pages 10464-10472.
https://doi.org/10.24963/ijcai.2025/1162
Immersive video serves as a key component of eXtended Reality (XR) that aims to create and interact with simulated virtual or hybrid environments. Such a technology allows users to experience immersive sensations that transcend time and space, and meanwhile continuously providing training data for emerging technologies like Embodied AI. Thanks to the advancements in sensing, computing, and display, recent years have witnessed many excellent works for XR and related hardware or software systems. However, challenges like high creation cost, lack of immersion, and limited scalability hinder the practical application of immersive video services. Whilst recently emerged generative artificial intelligence (GenAI) provides us with new insights in tackling existing challenges. In this paper, we conduct a comprehensive survey into the recent advances and future opportunities on how GenAI can benefit immersive video services. By introducing a systematic taxonomy, we meticulously classify the pertinent techniques and applications into three well-defined categories aligned with the pipeline of immersive video service: content creation, network delivery, and client-side display. This categorization enables a structured exploration of the diverse roles on how GenAI can benefit immersive video service, providing a framework for a more comprehensive understanding and evaluation of these technologies. To the best of our knowledge, this work is the first systematic survey of GenAI in XR settings, laying a foundation for future research in this interdisciplinary domain.
Keywords:
Computer Vision: CV: 3D computer vision
Computer Vision: CV: Image and video synthesis and generation
Computer Vision: CV: Representation learning
Multidisciplinary Topics and Applications: MTA: Interactive entertainment
