Mat-Instructions: A Large-Scale Inorganic Material Instruction Dataset for Large Language Models
Mat-Instructions: A Large-Scale Inorganic Material Instruction Dataset for Large Language Models
Ke Liu, Shangde Gao, Yichao Fu, Xiaoliang Wu, Shuo Tong, Ajitha Rajan, Hao Xu
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
AI and Social Good. Pages 9799-9807.
https://doi.org/10.24963/ijcai.2025/1089
Recent advancements in large language models (LLMs) have revolutionized research discovery across various scientific disciplines, including materials science. The discovery of novel materials, particularly crystal materials, is essential for achieving sustainable development goals (SDGs), as they drive breakthroughs in climate change mitigation, clean and affordable energy, and the promotion of industrial innovation. However, unlocking the full potential of LLMs in materials research remains challenging due to the lack of high-quality, diverse, and instruction-based datasets. Such datasets are crucial for guiding these models in understanding and predicting the structure, property, and function of materials across various tasks. To address this limitation, we introduce Mat-Instruction, a large-scale inorganic material instruction dataset, specifically designed to unlock the potential of LLMs in materials science. Extensive experiments on fine-tuning LLaMA with our Mat-Instruction dataset demonstrate its effectiveness in advancing progress for materials science. The code and dataset are available at https://github.com/zjuKeLiu/Mat-Instructions
Keywords:
Multidisciplinary Topics and Applications: General
