Enhancing Automated Grading in Science Education through LLM-Driven Causal Reasoning and Multimodal Analysis

Enhancing Automated Grading in Science Education through LLM-Driven Causal Reasoning and Multimodal Analysis

Haohao Zhu, Tingting Li, Peng He, Jiayu Zhou

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Human-Centred AI. Pages 10352-10360. https://doi.org/10.24963/ijcai.2025/1150

Automated assessment of open responses in K–12 science education poses significant challenges due to the multimodal nature of student work, which often integrates textual explanations, drawings, and handwritten elements. Traditional evaluation methods that focus solely on textual analysis fail to capture the full breadth of student reasoning and are susceptible to biases such as handwriting neatness or answer length. In this paper, we propose a novel LLM-augmented multimodal evaluation framework that addresses these limitations through a comprehensive, bias-corrected grading system. Our approach leverages LLMs to generate causal knowledge graphs that encapsulate the essential conceptual relationships in student responses, comparing these graphs with those derived automatically from the rubrics and submissions. Experimental results demonstrate that our framework improves grading accuracy and consistency over deep supervised learning and few-shot LLM baselines.
Keywords:
IJCAI25: Human-Centred AI