Multimodal Knowledge Retrieval-Augmented Iterative Alignment for Satellite Commonsense Conversation

Multimodal Knowledge Retrieval-Augmented Iterative Alignment for Satellite Commonsense Conversation

Qian Li, Xuchen Li, Zongyu Chang, Yuzheng Zhang, Cheng Ji, Shangguang Wang

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 8168-8176. https://doi.org/10.24963/ijcai.2025/908

Satellite technology has significantly influenced our daily lives, manifested in applications such as navigation and communication. With its development, a vast amount of multimodal satellite commonsense data has been generated, thus leading to an urgent demand for conversation about satellite data. However, existing large language models suffer from prevalent hallucinations and poor comprehensibility on multimodal satellite data due to their high professional content threshold and partial information opacity. To address these issues, we propose a multimodal satellite knowledge retrieval-augmented iterative alignment framework (Sat-RIA) for satellite commonsense conversation. We first construct multi-view retrieval expert knowledge to reduce hallucinations and enhance the interpretability of responses, which incorporates the satellite expert database, satellite rule, satellite image database, and a satellite knowledge graph. We next design commonsense conversation instructions to make the answers more legible and understandable. Furthermore, the retrieval-augmented iterative alignment module refines response precision by aligning outputs with task-specific standards through multi-stage evaluations. Finally, we construct satellite multi-turn dialogue and visual question-answer datasets for a more comprehensive evaluation of satellite commonsense conversation. Experimental results demonstrate that Sat-RIA outperforms existing large language models and provides more comprehensible answers with fewer hallucinations.
Keywords:
Natural Language Processing: NLP: Applications
Natural Language Processing: NLP: Other