AQT: Adversarial Query Transformers for Domain Adaptive Object Detection

Wei-Jie Huang; Yu-Lin Lu; Shih-Yao Lin; Yusheng Xie; Yen-Yu Lin

doi:10.24963/ijcai.2022/136

AQT: Adversarial Query Transformers for Domain Adaptive Object Detection

Wei-Jie Huang, Yu-Lin Lu, Shih-Yao Lin, Yusheng Xie, Yen-Yu Lin

Watch video

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence

Main Track. Pages 972-979. https://doi.org/10.24963/ijcai.2022/136

PDF BibTeX

Adversarial feature alignment is widely used in domain adaptive object detection. Despite the effectiveness on CNN-based detectors, its applicability to transformer-based detectors is less studied. In this paper, we present AQT (adversarial query transformers) to integrate adversarial feature alignment into detection transformers. The generator is a detection transformer which yields a sequence of feature tokens, and the discriminator consists of a novel adversarial token and a stack of cross-attention layers. The cross-attention layers take the adversarial token as the query and the feature tokens from the generator as the key-value pairs. Through adversarial learning, the adversarial token in the discriminator attends to the domain-specific feature tokens, while the generator produces domain-invariant features, especially on the attended tokens, hence realizing adversarial feature alignment on transformers. Thorough experiments over several domain adaptive object detection benchmarks demonstrate that our approach performs favorably against the state-of-the-art methods. Source code is available at https://github.com/weii41392/AQT.

Keywords:

Computer Vision: Transfer, low-shot, semi- and un- supervised learning

Computer Vision: Recognition (object detection, categorization)