» Articles » PMID: 26331936

MZDASoft: a Software Architecture That Enables Large-scale Comparison of Protein Expression Levels over Multiple Samples Based on Liquid Chromatography/tandem Mass Spectrometry

Overview
Specialty Chemistry
Date 2015 Sep 3
PMID 26331936
Citations 1
Authors
Affiliations
Soon will be listed here.
Abstract

Rationale: Without accurate peak linking/alignment, only the expression levels of a small percentage of proteins can be compared across multiple samples in Liquid Chromatography/Mass Spectrometry/Tandem Mass Spectrometry (LC/MS/MS) due to the selective nature of tandem MS peptide identification. This greatly hampers biomedical research that aims at finding biomarkers for disease diagnosis, treatment, and the understanding of disease mechanisms. A recent algorithm, PeakLink, has allowed the accurate linking of LC/MS peaks without tandem MS identifications to their corresponding ones with identifications across multiple samples collected from different instruments, tissues and labs, which greatly enhanced the ability of comparing proteins. However, PeakLink cannot be implemented practically for large numbers of samples based on existing software architectures, because it requires access to peak elution profiles from multiple LC/MS/MS samples simultaneously.

Methods: We propose a new architecture based on parallel processing, which extracts LC/MS peak features, and saves them in database files to enable the implementation of PeakLink for multiple samples. The software has been deployed in High-Performance Computing (HPC) environments. The core part of the software, MZDASoft Parallel Peak Extractor (PPE), can be downloaded with a user and developer's guide, and it can be run on HPC centers directly. The quantification applications, MZDASoft TandemQuant and MZDASoft PeakLink, are written in Matlab, which are compiled with a Matlab runtime compiler. A sample script that incorporates all necessary processing steps of MZDASoft for LC/MS/MS quantification in a parallel processing environment is available. The project webpage is http://compgenomics.utsa.edu/zgroup/MZDASoft.

Results: The proposed architecture enables the implementation of PeakLink for multiple samples. Significantly more (100%-500%) proteins can be compared over multiple samples with better quantification accuracy in test cases.

Conclusion: MZDASoft enables large-scale comparison of protein expression levels over multiple samples with much larger protein comparison coverage and better quantification accuracy. It is an efficient implementation based on parallel processing which can be used to process large amounts of data.

Citing Articles

Early response index: a statistic to discover potential early stage disease biomarkers.

Salekin S, Bari M, Raphael I, Forsthuber T, Zhang J BMC Bioinformatics. 2017; 18(1):313.

PMID: 28645323 PMC: 5481992. DOI: 10.1186/s12859-017-1712-y.

References
1.
Voss B, Hanselmann M, Renard B, Lindner M, Kothe U, Kirchner M . SIMA: simultaneous multiple alignment of LC/MS peak lists. Bioinformatics. 2011; 27(7):987-93. DOI: 10.1093/bioinformatics/btr051. View

2.
Trudgian D, Ridlova G, Fischer R, Mackeen M, Ternette N, Acuto O . Comparative evaluation of label-free SINQ normalized spectral index quantitation in the central proteomics facilities pipeline. Proteomics. 2011; 11(14):2790-7. DOI: 10.1002/pmic.201000800. View

3.
Christoforou A, Lilley K . Taming the isobaric tagging elephant in the room in quantitative proteomics. Nat Methods. 2011; 8(11):911-3. DOI: 10.1038/nmeth.1736. View

4.
Geiger T, Cox J, Ostasiewicz P, Wisniewski J, Mann M . Super-SILAC mix for quantitative proteomics of human tumor tissue. Nat Methods. 2010; 7(5):383-5. DOI: 10.1038/nmeth.1446. View

5.
Blackburn K, Mbeunkui F, Mitra S, Mentzel T, Goshe M . Improving protein and proteome coverage through data-independent multiplexed peptide fragmentation. J Proteome Res. 2010; 9(7):3621-37. DOI: 10.1021/pr100144z. View