» Articles » PMID: 26860878

A Description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Common Data Analysis Pipeline

Abstract

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced large proteomics data sets from the mass spectrometric interrogation of tumor samples previously analyzed by The Cancer Genome Atlas (TCGA) program. The availability of the genomic and proteomic data is enabling proteogenomic study for both reference (i.e., contained in major sequence databases) and nonreference markers of cancer. The CPTAC laboratories have focused on colon, breast, and ovarian tissues in the first round of analyses; spectra from these data sets were produced from 2D liquid chromatography-tandem mass spectrometry analyses and represent deep coverage. To reduce the variability introduced by disparate data analysis platforms (e.g., software packages, versions, parameters, sequence databases, etc.), the CPTAC Common Data Analysis Platform (CDAP) was created. The CDAP produces both peptide-spectrum-match (PSM) reports and gene-level reports. The pipeline processes raw mass spectrometry data according to the following: (1) peak-picking and quantitative data extraction, (2) database searching, (3) gene-based protein parsimony, and (4) false-discovery rate-based filtering. The pipeline also produces localization scores for the phosphopeptide enrichment studies using the PhosphoRS program. Quantitative information for each of the data sets is specific to the sample processing, with PSM and protein reports containing the spectrum-level or gene-level ("rolled-up") precursor peak areas and spectral counts for label-free or reporter ion log-ratios for 4plex iTRAQ. The reports are available in simple tab-delimited formats and, for the PSM-reports, in mzIdentML. The goal of the CDAP is to provide standard, uniform reports for all of the CPTAC data to enable comparisons between different samples and cancer types as well as across the major omics fields.

Citing Articles

Advancements in proteogenomics for preclinical targeted cancer therapy research.

Suo Y, Song Y, Wang Y, Liu Q, Rodriguez H, Zhou H Biophys Rep. 2025; 11(1):56-76.

PMID: 40070661 PMC: 11891078. DOI: 10.52601/bpr.2024.240053.


Deciphering the potential ability of DExD/H-box helicase 60 (DDX60) on the proliferation, diagnostic and prognostic biomarker in pancreatic cancer: a research based on silico, RNA-seq and molecular biology experiment.

Zhang D, Zhang E, Cai Y, Sun Y, Zeng P, Jiang X Hereditas. 2025; 162(1):6.

PMID: 39844327 PMC: 11753068. DOI: 10.1186/s41065-024-00361-9.


Comprehensive Analysis Reveals That ISCA1 Is Correlated with Ferroptosis-Related Genes Across Cancers and Is a Biomarker in Thyroid Carcinoma.

Xiong D, Li Z, Zuo L, Ge J, Gu Y, Zhang E Genes (Basel). 2025; 15(12.

PMID: 39766805 PMC: 11675480. DOI: 10.3390/genes15121538.


The role of KRT18 in lung adenocarcinoma development: integrative bioinformatics and experimental validation.

Li Y, Zeng M, Qin Y, Feng F, Wei H Discov Oncol. 2024; 15(1):841.

PMID: 39729139 PMC: 11680526. DOI: 10.1007/s12672-024-01728-0.


Reference Materials for Improving Reliability of Multiomics Profiling.

Ren L, Shi L, Zheng Y Phenomics. 2024; 4(5):487-521.

PMID: 39723231 PMC: 11666855. DOI: 10.1007/s43657-023-00153-7.


References
1.
Chambers M, MacLean B, Burke R, Amodei D, Ruderman D, Neumann S . A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol. 2012; 30(10):918-20. PMC: 3471674. DOI: 10.1038/nbt.2377. View

2.
Kim S, Mischerikow N, Bandeira N, Navarro J, Wich L, Mohammed S . The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. Mol Cell Proteomics. 2010; 9(12):2840-52. PMC: 3101864. DOI: 10.1074/mcp.M110.003731. View

3.
Karp N, Huber W, Sadowski P, Charles P, Hester S, Lilley K . Addressing accuracy and precision issues in iTRAQ quantitation. Mol Cell Proteomics. 2010; 9(9):1885-97. PMC: 2938101. DOI: 10.1074/mcp.M900628-MCP200. View

4.
Seymour S, Farrah T, Binz P, Chalkley R, Cottrell J, Searle B . A standardized framing for reporting protein identifications in mzIdentML 1.2. Proteomics. 2014; 14(21-22):2389-99. PMC: 4384534. DOI: 10.1002/pmic.201400080. View

5.
Lam H, Deutsch E, Eddes J, Eng J, King N, Stein S . Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics. 2007; 7(5):655-67. DOI: 10.1002/pmic.200600625. View