Phonovisual Biases in Language: is the Lexicon Tied to the Visual World?

Phonovisual Biases in Language: is the Lexicon Tied to the Visual World?

Andrea Gregor de Varda, Carlo Strapparava

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 643-649. https://doi.org/10.24963/ijcai.2021/89

The present paper addresses the study of cross-linguistic and cross-modal iconicity within a deep learning framework. An LSTM-based Recurrent Neural Network is trained to associate the phonetic representation of a concrete word, encoded as a sequence of feature vectors, to the visual representation of its referent, expressed as an HCNN-transformed image. The processing network is then tested, without further training, in a language that does not appear in the training set and belongs to a different language family. The performance of the model is evaluated through a comparison with a randomized baseline; we show that such an imaginative network is capable of extracting language-independent generalizations in the mapping from linguistic sounds to visual features, providing empirical support for the hypothesis of a universal sound-symbolic substrate underlying all languages.
Keywords:
Computer Vision: Language and Vision
Natural Language Processing: Phonology, Morphology, and Word Segmentation
Natural Language Processing: Psycholinguistics