Black-box Prompt Tuning for Vision-Language Model as a Service
Black-box Prompt Tuning for Vision-Language Model as a Service
Lang Yu, Qin Chen, Jiaju Lin, Liang He
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 1686-1694.
https://doi.org/10.24963/ijcai.2023/187
In the scenario of Model-as-a-Service (MaaS), pre-trained models are usually released as inference APIs. Users are allowed to query those models with manually crafted prompts. Without accessing the network structure and gradient information, it's tricky to perform continuous prompt tuning on MaaS, especially for vision-language models (VLMs) considering cross-modal interaction. In this paper, we propose a black-box prompt tuning framework for VLMs to learn task-relevant prompts without back-propagation. In particular, the vision and language prompts are jointly optimized in the intrinsic parameter subspace with various evolution strategies. Different prompt variants are also explored to enhance the cross-model interaction. Experimental results show that our proposed black-box prompt tuning framework outperforms both hand-crafted prompt engineering and gradient-based prompt learning methods, which serves as evidence of its capability to train task-relevant prompts in a derivative-free manner.
Keywords:
Computer Vision: CV: Vision and languageĀ
Machine Learning: ML: Evolutionary learning
Machine Learning: ML: Multi-modal learning