CVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for Autonomous Driving

Zijian Song; Huikun Bi; Ruisi Zhang; Tianlu Mao; Zhaoqi Wang

doi:10.24963/ijcai.2023/34

CVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for Autonomous Driving

Zijian Song, Huikun Bi, Ruisi Zhang, Tianlu Mao, Zhaoqi Wang

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

Main Track. Pages 298-308. https://doi.org/10.24963/ijcai.2023/34

PDF BibTeX

Trajectory prediction with uncertainty is a critical and challenging task for autonomous driving. Nowadays, we can easily access sensor data represented in multiple views. However, cross-view consistency has not been evaluated by the existing models, which might lead to divergences between the multimodal predictions from different views. It is not practical and effective when the network does not comprehend the 3D scene, which could cause the downstream module in a dilemma. Instead, we predicts multimodal trajectories while maintaining cross-view consistency. We presented a cross-view trajectory prediction method using shared 3D Queries (XVTP3D). We employ a set of 3D queries shared across views to generate multi-goals that are cross-view consistent. We also proposed a random mask method and coarse-to-fine cross-attention to capture robust cross-view features. As far as we know, this is the first work that introduces the outstanding top-down paradigm in BEV detection field to a trajectory prediction problem. The results of experiments on two publicly available datasets show that XVTP3D achieved state-of-the-art performance with consistent cross-view predictions.

Keywords:

Agent-based and Multi-agent Systems: MAS: Multi-agent learning

Agent-based and Multi-agent Systems: MAS: Human-agent interaction

Computer Vision: CV: Machine learning for vision