Multimodal Inference with Incremental Tabular Attributes

Xinda Chen; Zhen Xing; Zixian Zhang; Weimin Tan; Bo Yan

doi:10.24963/ijcai.2025/543

Multimodal Inference with Incremental Tabular Attributes

Xinda Chen, Zhen Xing, Zixian Zhang, Weimin Tan, Bo Yan

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Main Track. Pages 4878-4886. https://doi.org/10.24963/ijcai.2025/543

PDF BibTeX

Multimodal Learning with visual and tabular modalities has become more and more popular nowadays, especially in the healthcare area. Due to the adaptation of new equipment or new factors being introduced, the tabular modality keeps changing. However, the standard process of training multimodal AI models requires tables to have fixed columns in training and inference; thus, it is not suitable for handling dynamically changed tables. Therefore, new methods are needed for efficiently handling such tables in multimodal learning. In this paper, we introduce a new task, multimodal inference with incremental tabular attributes, which aims to enable trained multimodal models to leverage incremental attributes in tabular modality during the inference stage efficiently. We implement a specialized encoder to disentangle the latent representation of incremental tabular attributes inside itself and with the old attributes to reduce information redundancy and further align the incremental attributes with the visual modality with consistency loss to improve information richness. Experimental results across five public datasets show that our method effectively utilizes incremental tabular attributes, achieving state-of-the-art performance in general scenarios. Beyond the inference, we also find that our method achieved better performance in fully supervised settings, evoking a new training style for multimodal learning with tables.

Keywords:

Machine Learning: ML: Multi-modal learning

Computer Vision: CV: Multimodal learning

Machine Learning: ML: Incremental learning