U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation

U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation

Zizhuo Li, Shihua Zhang, Jiayi Ma

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 1169-1176. https://doi.org/10.24963/ijcai.2023/130

Local context capturing has become the core factor for achieving leading performance in two-view correspondence learning. Recent advances have devised various local context extractors whereas typically adopting explicit neighborhood relation modeling that is restricted and inflexible. To address this issue, we introduce U-Match, an attentional graph neural network that has the flexibility to enable implicit local context awareness at multiple levels. Specifically, a hierarchy-aware graph representation (HAGR) module is designed and fleshed out by local context pooling and unpooling operations. The former encodes local context by adaptively sampling a set of nodes to form a coarse-grained graph, while the latter decodes local context by recovering the coarsened graph back to its original size. Moreover, an orthogonal fusion module is proposed for the collaborative use of HAGR module, which integrates complementary local and global information into compact feature representations without redundancy. Extensive experiments on different visual tasks prove that our method significantly surpasses the state-of-the-arts. In particular, U-Match attains an AUC at 5 degree threshold of 60.53% on the challenging YFCC100M dataset without RANSAC, outperforming the strongest prior model by 8.61 absolute percentage points. Our code is publicly available at https://github.com/ZizhuoLi/U-Match.
Keywords:
Computer Vision: CV: Motion and tracking
Computer Vision: CV: Image and video retrieval