Unified Molecule-Text Language Model with Discrete Token Representation

Unified Molecule-Text Language Model with Discrete Token Representation

Shuhan Guo, Yatao Bian, Ruibing Wang, Nan Yin, Zhen Wang, Quanming Yao

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
AI4Tech: AI Enabling Technologies. Pages 9205-9213. https://doi.org/10.24963/ijcai.2025/1023

The remarkable success of Large Language Models (LLMs) across diverse tasks has driven the research community to extend their capabilities to molecular applications. However, most molecular LLMs employ adapter-based architectures that fail to equally integrate molecule and text modalities and lack explicit supervision signals for the molecular modality. To address these issues, we introduce UniMoT, a Unified Molecule-Text LLM adopting a tokenizer-based architecture that expands the vocabulary of LLMs with molecule tokens. Specifically, we introduce a Vector Quantization-driven tokenizer that incorporates a Q-Former to bridge the modality gap between molecule and text. This tokenizer transforms molecular structures into sequences of tokens exhibiting causal dependency, thereby encapsulating both high-level molecular features and textual information. Equipped with this tokenizer, UniMoT unifies molecule and text modalities under a shared token representation and an autoregressive training paradigm. This enables the model to process molecular structures as a distinct linguistic system and generate them in textual form. Through a four-stage training scheme, UniMoT functions as a multi-modal generalist capable of performing both molecule-to-text and text-to-molecule tasks. Extensive experiments demonstrate that UniMoT achieves state-of-the-art performance across a wide range of molecule comprehension and generation tasks.
Keywords:
Advanced AI4Tech: Multimodal AI4Tech
Advanced AI4Tech: AI4Tech foundations
Advanced AI4Tech: Data-driven AI4Tech
Emerging AI4Tech: Emerging AI4Tech areas