GASP: Gated Attention for Saliency Prediction

GASP: Gated Attention for Saliency Prediction

Fares Abawi, Tom Weber, Stefan Wermter

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 584-591. https://doi.org/10.24963/ijcai.2021/81

Saliency prediction refers to the computational task of modeling overt attention. Social cues greatly influence our attention, consequently altering our eye movements and behavior. To emphasize the efficacy of such features, we present a neural model for integrating social cues and weighting their influences. Our model consists of two stages. During the first stage, we detect two social cues by following gaze, estimating gaze direction, and recognizing affect. These features are then transformed into spatiotemporal maps through image processing operations. The transformed representations are propagated to the second stage (GASP) where we explore various techniques of late fusion for integrating social cues and introduce two sub-networks for directing attention to relevant stimuli. Our experiments indicate that fusion approaches achieve better results for static integration methods, whereas non-fusion approaches for which the influence of each modality is unknown, result in better outcomes when coupled with recurrent models for dynamic saliency prediction. We show that gaze direction and affective representations contribute a prediction to ground-truth correspondence improvement of at least 5% compared to dynamic saliency models without social cues. Furthermore, affective representations improve GASP, supporting the necessity of considering affect-biased attention in predicting saliency.
Keywords:
Computer Vision: 2D and 3D Computer Vision
Computer Vision: Biometrics, Face and Gesture Recognition
Machine Learning: Deep Learning