» Articles » PMID: 27975231

Statistical Approaches to Candidate Biomarker Panel Selection

Overview
Date 2016 Dec 16
PMID 27975231
Citations 4
Authors
Affiliations
Soon will be listed here.
Abstract

The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19 ). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).

Citing Articles

Application of SWATH Mass Spectrometry and Machine Learning in the Diagnosis of Inflammatory Bowel Disease Based on the Stool Proteome.

Shajari E, Gagne D, Malick M, Roy P, Noel J, Gagnon H Biomedicines. 2024; 12(2).

PMID: 38397935 PMC: 10886680. DOI: 10.3390/biomedicines12020333.


Breath Biopsy to Identify Exhaled Volatile Organic Compounds Biomarkers for Liver Cirrhosis Detection.

Ferrandino G, De Palo G, Murgia A, Birch O, Tawfike A, Smith R J Clin Transl Hepatol. 2023; 11(3):638-648.

PMID: 36969895 PMC: 10037526. DOI: 10.14218/JCTH.2022.00309.


Lessons and tips for designing a machine learning study using EHR data.

Arbet J, Brokamp C, Meinzen-Derr J, Trinkley K, Spratt H J Clin Transl Sci. 2021; 5(1):e21.

PMID: 33948244 PMC: 8057454. DOI: 10.1017/cts.2020.513.


Computational advances of tumor marker selection and sample classification in cancer proteomics.

Tang J, Wang Y, Luo Y, Fu J, Zhang Y, Li Y Comput Struct Biotechnol J. 2020; 18:2012-2025.

PMID: 32802273 PMC: 7403885. DOI: 10.1016/j.csbj.2020.07.009.

References
1.
Tusher V, Tibshirani R, Chu G . Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A. 2001; 98(9):5116-21. PMC: 33173. DOI: 10.1073/pnas.091062498. View

2.
Hanley J, McNeil B . The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982; 143(1):29-36. DOI: 10.1148/radiology.143.1.7063747. View

3.
Zweig M, Campbell G . Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem. 1993; 39(4):561-77. View

4.
Friedman J, Roosen C . An introduction to multivariate adaptive regression splines. Stat Methods Med Res. 1995; 4(3):197-217. DOI: 10.1177/096228029500400303. View