A Generalized Protein Identification Method for Novel and Diverse Sequencing Technologies

Overview

Journal NAR Genom Bioinform

Publisher Oxford University Press

Specialty Biology

Date 2024 Sep 19

PMID 39296929

Authors

Bikash Kumar Bhandari

Nick Goldman

Affiliations

Soon will be listed here.

Abstract

Protein sequencing is a rapidly evolving field with much progress towards the realization of a new generation of protein sequencers. The early devices, however, may not be able to reliably discriminate all 20 amino acids, resulting in a partial, noisy and possibly error-prone signature of a protein. Rather than achieving sequencing, these devices may aim to identify target proteins by comparing such signatures to databases of known proteins. However, there are no broadly applicable methods for this identification problem. Here, we devise a hidden Markov model method to study the generalized problem of protein identification from noisy signature data. Based on a hypothetical sequencing device that can simulate several novel technologies, we show that on the human protein database ( = 20 181) our method has a good performance under many different operating conditions such as various levels of signal resolvability, different numbers of discriminated amino acids, sequence fragments, and insertion and deletion error rates. Our results demonstrate the possibility of protein identification with high accuracy on many early experimental devices. We anticipate our method to be applicable for a wide range of protein sequencing devices in the future.

References

Zhang H, Li H, Jain C, Cheng H, Au K, Li H . Real-time mapping of nanopore raw signals. Bioinformatics. 2021; 37(Suppl_1):i477-i483. PMC: 8336444. DOI: 10.1093/bioinformatics/btab264. View

Zhang S, Huang G, Abraham Versloot R, Bruininks B, de Souza P, Marrink S . Bottom-up fabrication of a proteasome-nanopore that unravels and processes single proteins. Nat Chem. 2021; 13(12):1192-1199. PMC: 7612055. DOI: 10.1038/s41557-021-00824-w. View

Smith M, Simpson Z, Marcotte E . Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier. PLoS Comput Biol. 2023; 19(5):e1011157. PMC: 10256185. DOI: 10.1371/journal.pcbi.1011157. View

Schreiber J, Karplus K . Analysis of nanopore data using hidden Markov models. Bioinformatics. 2015; 31(12):1897-903. PMC: 4553831. DOI: 10.1093/bioinformatics/btv046. View

Neumann D, Reddy A, Ben-Hur A . RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data. BMC Bioinformatics. 2022; 23(1):142. PMC: 9020074. DOI: 10.1186/s12859-022-04686-y. View

van Ginkel J, Filius M, Szczepaniak M, Tulinski P, Meyer A, Joo C . Single-molecule peptide fingerprinting. Proc Natl Acad Sci U S A. 2018; 115(13):3338-3343. PMC: 5879649. DOI: 10.1073/pnas.1707207115. View

Swaminathan J, Boulgakov A, Hernandez E, Bardo A, Bachman J, Marotta J . Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures. Nat Biotechnol. 2018; . PMC: 6482110. DOI: 10.1038/nbt.4278. View

Zhou J, Lan Q, Li W, Ji L, Wang K, Xia X . Single Molecule Protein Segments Sequencing by a Plasmonic Nanopore. Nano Lett. 2023; 23(7):2800-2807. DOI: 10.1021/acs.nanolett.3c00086. View

Li W, Zhou J, Maccaferri N, Krahne R, Wang K, Garoli D . Enhanced Optical Spectroscopy for Multiplexed DNA and Protein-Sequencing with Plasmonic Nanopores: Challenges and Prospects. Anal Chem. 2022; 94(2):503-514. PMC: 8771637. DOI: 10.1021/acs.analchem.1c04459. View

10.

Afshar Bakshloo M, Kasianowicz J, Pastoriza-Gallego M, Mathe J, Daniel R, Piguet F . Nanopore-Based Protein Identification. J Am Chem Soc. 2022; 144(6):2716-2725. DOI: 10.1021/jacs.1c11758. View

11.

Nivala J, Mulroney L, Li G, Schreiber J, Akeson M . Discrimination among protein variants using an unfoldase-coupled nanopore. ACS Nano. 2014; 8(12):12365-75. DOI: 10.1021/nn5049987. View

12.

Heather J, Chain B . The sequence of sequencers: The history of sequencing DNA. Genomics. 2015; 107(1):1-8. PMC: 4727787. DOI: 10.1016/j.ygeno.2015.11.003. View

13.

Ouldali H, Sarthak K, Ensslen T, Piguet F, Manivet P, Pelta J . Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore. Nat Biotechnol. 2019; 38(2):176-181. PMC: 7008938. DOI: 10.1038/s41587-019-0345-2. View

14.

Restrepo-Perez L, John S, Aksimentiev A, Joo C, Dekker C . SDS-assisted protein transport through solid-state nanopores. Nanoscale. 2017; 9(32):11685-11693. PMC: 5611827. DOI: 10.1039/c7nr02450a. View

15.

Yao Y, Docter M, van Ginkel J, de Ridder D, Joo C . Single-molecule protein sequencing through fingerprinting: computational assessment. Phys Biol. 2015; 12(5):055003. DOI: 10.1088/1478-3975/12/5/055003. View

16.

Swaminathan J, Boulgakov A, Marcotte E . A theoretical justification for single molecule peptide sequencing. PLoS Comput Biol. 2015; 11(2):e1004080. PMC: 4341059. DOI: 10.1371/journal.pcbi.1004080. View

17.

Rang F, Kloosterman W, de Ridder J . From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol. 2018; 19(1):90. PMC: 6045860. DOI: 10.1186/s13059-018-1462-9. View

18.

Bandeira N, Pham V, Pevzner P, Arnott D, Lill J . Automated de novo protein sequencing of monoclonal antibodies. Nat Biotechnol. 2008; 26(12):1336-8. PMC: 2891972. DOI: 10.1038/nbt1208-1336. View

19.

Bao Y, Wadden J, Erb-Downward J, Ranjan P, Zhou W, McDonald T . SquiggleNet: real-time, direct classification of nanopore signals. Genome Biol. 2021; 22(1):298. PMC: 8548853. DOI: 10.1186/s13059-021-02511-y. View

20.

Bonini A, Sauciuc A, Maglia G . Engineered nanopores for exopeptidase protein sequencing. Nat Methods. 2023; 21(1):16-17. DOI: 10.1038/s41592-023-02136-y. View