StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation

StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation

Peter Schaldenbrand, Zhixuan Liu, Jean Oh

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
AI and Arts. Pages 4966-4972. https://doi.org/10.24963/ijcai.2022/688

Generating images that fit a given text description using machine learning has improved greatly with the release of technologies such as the CLIP image-text encoder model; however, current methods lack artistic control of the style of image to be generated. We present an approach for generating styled drawings for a given text description where a user can specify a desired drawing style using a sample image. Inspired by a theory in art that style and content are generally inseparable during the creative process, we propose a coupled approach, known here as StyleCLIPDraw, whereby the drawing is generated by optimizing for style and content simultaneously throughout the process as opposed to applying style transfer after creating content in a sequence. Based on human evaluation, the styles of images generated by StyleCLIPDraw are strongly preferred to those by the sequential approach. Although the quality of content generation degrades for certain styles, overall considering both content and style, StyleCLIPDraw is found far more preferred, indicating the importance of style, look, and feel of machine generated images to people as well as indicating that style is coupled in the drawing process itself. Our code, a demonstration, and style evaluation data are publicly available.
Keywords:
Application domains: Images and visual arts
Methods and resources: Machine learning, deep learning, neural models, reinforcement learning
Theory and philosophy of arts and creativity in AI systems: Autonomous creative or artistic AI
Theory and philosophy of arts and creativity in AI systems: Evaluation of artistic or creative outputs, or of systems