ContextAware: A Multi-Agent Framework for Detecting Harmful Image-Based Comments on Social Media

ContextAware: A Multi-Agent Framework for Detecting Harmful Image-Based Comments on Social Media

Zheng Wei, Mingchen Li, Pu Zhang, Xinyu Liu, Huamin Qu, Pan Hui

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
AI and Social Good. Pages 9927-9935. https://doi.org/10.24963/ijcai.2025/1103

Detecting hidden stigmatization in social media poses significant challenges due to semantic misalignments between textual and visual modalities, as well as the subtlety of implicit stigmatization. Traditional approaches often fail to capture these complexities in real-world, multimodal content. To address this gap, we introduce ContextAware, an agent-based framework that leverages specialized modules to collaboratively process and analyze images, textual context, and social interactions. Our approach begins by clustering image embeddings to identify recurring content, activating high-likes agents for deeper analysis of images receiving substantial user engagement, while comprehensive agents handle lower-engagement images. By integrating case-based learning, textual sentiment, and vision-language models (VLMs), ContextAware refines its detection of harmful content. We evaluate ContextAware on a self-collected Douyin dataset focused on interracial relationships, comprising 871 short videos and 885,502 comments—of which a notable portion are image-based. Experimental results show that ContextAware not only outperforms state-of-the-art methods in accuracy and F1 score but also effectively detects implicit stigmatization within the highly contextual environment of social media. Our findings underscore the importance of agent-based architectures and multimodal alignment in capturing nuanced, culturally specific forms of harmful content.
Keywords:
Agent-based and Multi-agent Systems: General