A Comprehensive and Systematic Review for Deep Learning-Based De Novo Peptide Sequencing
A Comprehensive and Systematic Review for Deep Learning-Based De Novo Peptide Sequencing
Jun Xia, Jingbo Zhou, Shaorong Chen, Tianze Ling, Stan Z. Li
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Survey Track. Pages 10733-10741.
https://doi.org/10.24963/ijcai.2025/1191
Tandem mass spectrometry (MS/MS) has revolutionized the field of proteomics, enabling the high-throughput identification of proteins. However, one of the central challenges in mass spectrometry-based proteomics remains peptide identification, especially in the absence of a comprehensive peptide database. While traditional database search methods compare observed mass spectra to pre-existing protein databases, they are limited by the availability and completeness of these databases. \emph{De novo} peptide sequencing, which derives peptide sequences directly from mass spectra, has emerged as a crucial approach in such cases. In recent years, deep learning has made significant strides in this domain. These methods train deep neural networks for translating mass spectra into peptide sequences without relying on any pre-constructed databases. Despite significant progress, this field still lacks a comprehensive and systematic review. In this paper, we provide the first review of deep learning-based \emph{de novo} peptide sequencing techniques from the perspectives of data types, model architectures, decoding strategies, applications and evaluation metrics. We also identify key challenges and highlight promising avenues for future research, providing a valuable resource for the AI and scientific communities.
Keywords:
Multidisciplinary Topics and Applications: MTA: Life sciences
Machine Learning: ML: Representation learning
