» Articles » PMID: 16585533

Thousands of Samples Are Needed to Generate a Robust Gene List for Predicting Outcome in Cancer

Overview
Specialty Science
Date 2006 Apr 6
PMID 16585533
Citations 259
Authors
Affiliations
Soon will be listed here.
Abstract

Predicting at the time of discovery the prognosis and metastatic potential of cancer is a major challenge in current clinical research. Numerous recent studies searched for gene expression signatures that outperform traditionally used clinical parameters in outcome prediction. Finding such a signature will free many patients of the suffering and toxicity associated with adjuvant chemotherapy given to them under current protocols, even though they do not need such treatment. A reliable set of predictive genes also will contribute to a better understanding of the biological mechanism of metastasis. Several groups have published lists of predictive genes and reported good predictive performance based on them. However, the gene lists obtained for the same clinical types of patients by different groups differed widely and had only very few genes in common. This lack of agreement raised doubts about the reliability and robustness of the reported predictive gene lists, and the main source of the problem was shown to be the small number of samples that were used to generate the gene lists. Here, we introduce a previously undescribed mathematical method, probably approximately correct (PAC) sorting, for evaluating the robustness of such lists. We calculate for several published data sets the number of samples that are needed to achieve any desired level of reproducibility. For example, to achieve a typical overlap of 50% between two predictive lists of genes, breast cancer studies would need the expression profiles of several thousand early discovery patients.

Citing Articles

Competing endogenous RNAs (ceRNAs) and drug resistance to cancer therapy.

To K, Zhang H, Cho W Cancer Drug Resist. 2024; 7:37.

PMID: 39403602 PMC: 11472581. DOI: 10.20517/cdr.2024.66.


All (remains) in the family? Using healthy relatives to define Crohn's gut microbiome alterations.

Amir A, Haberman Y Cell Rep Med. 2024; 5(7):101651.

PMID: 39019007 PMC: 11293313. DOI: 10.1016/j.xcrm.2024.101651.


Few-shot genes selection: subset of PAM50 genes for breast cancer subtypes classification.

Okimoto L, Mendonca-Neto R, Nakamura F, Nakamura E, Fenyo D, Silva C BMC Bioinformatics. 2024; 25(1):92.

PMID: 38429657 PMC: 10908178. DOI: 10.1186/s12859-024-05715-8.


Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD.

Tomasoni D, Lombardo R, Lauria M Front Genet. 2024; 15:1270387.

PMID: 38348453 PMC: 10859452. DOI: 10.3389/fgene.2024.1270387.


Gastric cancer with enhanced myogenesis is associated with less cell proliferation, enriched epithelial-to-mesenchymal transition and angiogenesis, and poor clinical outcomes.

Chida K, Oshi M, An N, Kanazawa H, Roy A, Mann G Am J Cancer Res. 2024; 14(1):355-367.

PMID: 38323295 PMC: 10839307.


References
1.
Ioannidis J . Microarrays and molecular research: noise discovery?. Lancet. 2005; 365(9458):454-5. DOI: 10.1016/S0140-6736(05)17878-7. View

2.
Bhattacharjee A, Richards W, Staunton J, Li C, Monti S, Vasa P . Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A. 2001; 98(24):13790-5. PMC: 61120. DOI: 10.1073/pnas.191502998. View

3.
Rosenwald A, Wright G, Chan W, Connors J, Campo E, Fisher R . The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002; 346(25):1937-47. DOI: 10.1056/NEJMoa012914. View

4.
Ein-Dor L, Kela I, Getz G, Givol D, Domany E . Outcome signature genes in breast cancer: is there a unique set?. Bioinformatics. 2004; 21(2):171-8. DOI: 10.1093/bioinformatics/bth469. View

5.
Khan J, Wei J, Ringner M, Saal L, Ladanyi M, Westermann F . Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001; 7(6):673-9. PMC: 1282521. DOI: 10.1038/89044. View