MCD-CLIP: Multi-view Chest Disease Diagnosis with Disentangled CLIP

MCD-CLIP: Multi-view Chest Disease Diagnosis with Disentangled CLIP

Songyue Cai, Yujie Mo, Liang Peng, Yucheng Xie, Tao Tong, Xiaofeng Zhu

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 702-710. https://doi.org/10.24963/ijcai.2025/79

Pre-trained methods for multi-view chest X-ray images have demonstrated impressive performance in chest disease diagnosis, but there are still some limitations that need to be addressed. Firstly, many pre-trained methods require full fine-tuning pre-trained models to induce significant computational resource usage and the prior knowledge destruction. Secondly, many pre-trained methods cannot efficiently balance consistency and complementarity among views, leading to information loss and performance degradation. To tackle these issues, we propose MCD-CLIP, a CLIP-based multi-view chest disease diagnosis method. It uses visual prompts and a Prompt-Aligner to align prompts across views, along with the additional text representation for efficient transfer. Moreover, we employ Adapters to disentangle the image representation, maintaining consistency and complementarity from different views. Experimental results on the chest X-ray dataset demonstrate that MCD-CLIP achieves comparable or better performance on a variety of tasks with 94.31% fewer tunable parameters compared to state-of-the-art methods. The source codes are released at https://github.com/YuzunoKawori/MCD-CLIP.
Keywords:
Computer Vision: CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision: CV: Biomedical image analysis
Computer Vision: CV: Multimodal learning