MZDASoft: a Software Architecture That Enables Large-scale Comparison of Protein Expression Levels over Multiple Samples Based on Liquid Chromatography/tandem Mass Spectrometry

Overview

Journal Rapid Commun Mass Spectrom

Specialty Chemistry

Date 2015 Sep 3

PMID 26331936

Citations 1

Authors

Mehrab Ghanat Bari

Nelson Ramirez

Zhiwei Wang

Jianqiu Michelle Zhang

Affiliations

Soon will be listed here.

Abstract

Rationale: Without accurate peak linking/alignment, only the expression levels of a small percentage of proteins can be compared across multiple samples in Liquid Chromatography/Mass Spectrometry/Tandem Mass Spectrometry (LC/MS/MS) due to the selective nature of tandem MS peptide identification. This greatly hampers biomedical research that aims at finding biomarkers for disease diagnosis, treatment, and the understanding of disease mechanisms. A recent algorithm, PeakLink, has allowed the accurate linking of LC/MS peaks without tandem MS identifications to their corresponding ones with identifications across multiple samples collected from different instruments, tissues and labs, which greatly enhanced the ability of comparing proteins. However, PeakLink cannot be implemented practically for large numbers of samples based on existing software architectures, because it requires access to peak elution profiles from multiple LC/MS/MS samples simultaneously.

Methods: We propose a new architecture based on parallel processing, which extracts LC/MS peak features, and saves them in database files to enable the implementation of PeakLink for multiple samples. The software has been deployed in High-Performance Computing (HPC) environments. The core part of the software, MZDASoft Parallel Peak Extractor (PPE), can be downloaded with a user and developer's guide, and it can be run on HPC centers directly. The quantification applications, MZDASoft TandemQuant and MZDASoft PeakLink, are written in Matlab, which are compiled with a Matlab runtime compiler. A sample script that incorporates all necessary processing steps of MZDASoft for LC/MS/MS quantification in a parallel processing environment is available. The project webpage is http://compgenomics.utsa.edu/zgroup/MZDASoft.

Results: The proposed architecture enables the implementation of PeakLink for multiple samples. Significantly more (100%-500%) proteins can be compared over multiple samples with better quantification accuracy in test cases.

Conclusion: MZDASoft enables large-scale comparison of protein expression levels over multiple samples with much larger protein comparison coverage and better quantification accuracy. It is an efficient implementation based on parallel processing which can be used to process large amounts of data.

Citing Articles

Early response index: a statistic to discover potential early stage disease biomarkers.

Salekin S, Bari M, Raphael I, Forsthuber T, Zhang J BMC Bioinformatics. 2017; 18(1):313.

PMID: 28645323 PMC: 5481992. DOI: 10.1186/s12859-017-1712-y.

References

Voss B, Hanselmann M, Renard B, Lindner M, Kothe U, Kirchner M . SIMA: simultaneous multiple alignment of LC/MS peak lists. Bioinformatics. 2011; 27(7):987-93. DOI: 10.1093/bioinformatics/btr051. View

Trudgian D, Ridlova G, Fischer R, Mackeen M, Ternette N, Acuto O . Comparative evaluation of label-free SINQ normalized spectral index quantitation in the central proteomics facilities pipeline. Proteomics. 2011; 11(14):2790-7. DOI: 10.1002/pmic.201000800. View

Christoforou A, Lilley K . Taming the isobaric tagging elephant in the room in quantitative proteomics. Nat Methods. 2011; 8(11):911-3. DOI: 10.1038/nmeth.1736. View

Geiger T, Cox J, Ostasiewicz P, Wisniewski J, Mann M . Super-SILAC mix for quantitative proteomics of human tumor tissue. Nat Methods. 2010; 7(5):383-5. DOI: 10.1038/nmeth.1446. View

Blackburn K, Mbeunkui F, Mitra S, Mentzel T, Goshe M . Improving protein and proteome coverage through data-independent multiplexed peptide fragmentation. J Proteome Res. 2010; 9(7):3621-37. DOI: 10.1021/pr100144z. View

Pluskal T, Castillo S, Villar-Briones A, Oresic M . MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics. 2010; 11:395. PMC: 2918584. DOI: 10.1186/1471-2105-11-395. View

Deutsch E, Mendoza L, Shteynberg D, Farrah T, Lam H, Tasman N . A guided tour of the Trans-Proteomic Pipeline. Proteomics. 2010; 10(6):1150-9. PMC: 3017125. DOI: 10.1002/pmic.200900375. View

Tsou C, Tsai C, Tsui Y, Sudhir P, Wang Y, Chen Y . IDEAL-Q, an automated tool for label-free quantitation analysis using an efficient peptide alignment approach and spectral data validation. Mol Cell Proteomics. 2009; 9(1):131-44. PMC: 2808259. DOI: 10.1074/mcp.M900177-MCP200. View

Ow S, Salim M, Noirel J, Evans C, Rehman I, Wright P . iTRAQ underestimation in simple and complex mixtures: "the good, the bad and the ugly". J Proteome Res. 2009; 8(11):5347-55. DOI: 10.1021/pr900634c. View

10.

Cox J, Mann M . MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008; 26(12):1367-72. DOI: 10.1038/nbt.1511. View

11.

Lange E, Tautenhahn R, Neumann S, Gropl C . Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements. BMC Bioinformatics. 2008; 9:375. PMC: 2570366. DOI: 10.1186/1471-2105-9-375. View

12.

Sturm M, Bertsch A, Gropl C, Hildebrandt A, Hussong R, Lange E . OpenMS - an open-source software framework for mass spectrometry. BMC Bioinformatics. 2008; 9:163. PMC: 2311306. DOI: 10.1186/1471-2105-9-163. View

13.

Vandenbogaert M, Li-Thiao-Te S, Kaltenbach H, Zhang R, Aittokallio T, Schwikowski B . Alignment of LC-MS images, with applications to biomarker discovery and protein identification. Proteomics. 2008; 8(4):650-72. DOI: 10.1002/pmic.200700791. View

14.

Mueller L, Rinner O, Schmidt A, Letarte S, Bodenmiller B, Brusniak M . SuperHirn - a novel tool for high resolution LC-MS-based peptide/protein profiling. Proteomics. 2007; 7(19):3470-80. DOI: 10.1002/pmic.200700057. View

15.

Bantscheff M, Schirle M, Sweetman G, Rick J, Kuster B . Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem. 2007; 389(4):1017-31. DOI: 10.1007/s00216-007-1486-6. View

16.

van der Burgt Y, Taban I, Konijnenburg M, Biskup M, Duursma M, Heeren R . Parallel processing of large datasets from NanoLC-FTICR-MS measurements. J Am Soc Mass Spectrom. 2006; 18(1):152-61. DOI: 10.1016/j.jasms.2006.09.005. View

17.

Sadygov R, Maroto F, Huhmer A . ChromAlign: A two-step algorithmic procedure for time alignment of three-dimensional LC-MS chromatographic surfaces. Anal Chem. 2006; 78(24):8207-17. DOI: 10.1021/ac060923y. View

18.

Smith C, Want E, OMaille G, Abagyan R, Siuzdak G . XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006; 78(3):779-87. DOI: 10.1021/ac051437y. View

19.

Venable J, Dong M, Wohlschlegel J, Dillin A, Yates J . Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat Methods. 2005; 1(1):39-45. DOI: 10.1038/nmeth705. View

20.

Cui J, Ma X, Chen L, Zhang J . SCFIA: a statistical corresponding feature identification algorithm for LC/MS. BMC Bioinformatics. 2011; 12:439. PMC: 3233610. DOI: 10.1186/1471-2105-12-439. View