Appearance Prompt Vision Transformer for Connectome Reconstruction

Appearance Prompt Vision Transformer for Connectome Reconstruction

Rui Sun, Naisong Luo, Yuwen Pan, Huayu Mai, Tianzhu Zhang, Zhiwei Xiong, Feng Wu

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 1423-1431. https://doi.org/10.24963/ijcai.2023/158

Neural connectivity reconstruction aims to understand the function of biological reconstruction and promote basic scientific research. The intricate morphology and densely intertwined branches make it an extremely challenging task. Most previous best-performing methods adopt affinity learning or metric learning. Nevertheless, they either neglect to model explicit voxel semantics caused by implicit optimization or are hysteresis to spatial information. Furthermore, the inherent locality of 3D CNNs limits modeling long-range dependencies, leading to sub-optimal results. In this work, we propose a coherent and unified Appearance Prompt Vision Transformer (APViT) to integrate affinity and metric learning to exploit the complementarity by learning long-range spatial dependencies. The proposed APViT enjoys several merits. First, the extension continuity-aware attention module aims at constructing hierarchical attention customized for neuron extensibility and slice continuity to learn instance voxel semantic context from a global perspective and utilize continuity priors to enhance voxel spatial awareness. Second, the appearance prompt modulator is responsible for leveraging voxel-adaptive appearance knowledge conditioned on affinity rich in spatial information to instruct instance voxel semantics, exploiting the potential of affinity learning to complement metric learning. Extensive experimental results on multiple challenging benchmarks demonstrate that our APViT achieves consistent improvements with huge flexibility under the same post-processing strategy.
Keywords:
Computer Vision: CV: Biomedical image analysis
Computer Vision: CV: Segmentation