VQCounter: Designing Visual Prompt Queue for Accurate Open-World Counting
VQCounter: Designing Visual Prompt Queue for Accurate Open-World Counting
Fanfan Ye, Yiqi Fan, Qiaoyong Zhong, Shicai Yang, Di Xie, Jie Song, Mingli Song
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 2260-2268.
https://doi.org/10.24963/ijcai.2025/252
Class-agnostic counting enables enumerating arbitrary object classes beyond those seen during training. Recent studies attempted to exploit the potential of visual foundation models such as GroundingDINO. Despite the considerable progress, we observe certain shortcomings, including the limited diversity of visual prompts and suboptimal training regimen.
To address these issues, we introduce VQCounter, which incorporates a visual prompt queue mechanism designed to enrich the diversity of visual prompts.
A random modality switching strategy is proposed during training to strengthen both textual and visual modalities.
Besides, in light of weak point supervision, a Voronoi diagram-based cost (VoronoiCost) is designed to improve Hungarian matching, leading to more stable and faster convergence.
Building upon the Voronoi diagram, we also propose a novel set of more stringent evaluation metrics, which take point localization into account.
Extensive experiments on the FSC-147 and CARPK datasets demonstrate that VQCounter achieves state-of-the-art performance in both zero-shot and few-shot settings, significantly outperforming existing methods across nearly all evaluations.
Keywords:
Computer Vision: CV: Scene analysis and understanding
Computer Vision: CV: Multimodal learning
Computer Vision: CV: Transfer, low-shot, semi- and un- supervised learning
