» Articles » PMID: 34753956

Whole Genome and Exome Sequencing Reference Datasets from a Multi-center and Cross-platform Benchmark Study

Abstract

With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.

Citing Articles

A detailed analysis of second and third-generation sequencing approaches for accurate length determination of short tandem repeats and homopolymers.

Jeanjean S, Shen Y, Hardy L, Daunay A, Delepine M, Gerber Z Nucleic Acids Res. 2025; 53(5).

PMID: 40036507 PMC: 11878640. DOI: 10.1093/nar/gkaf131.


Enhancing Clinical Applications by Evaluation of Sensitivity and Specificity in Whole Exome Sequencing.

Moon Y, Hong C, Kim Y, Kim J, Ye S, Kang E Int J Mol Sci. 2025; 25(24.

PMID: 39769013 PMC: 11678496. DOI: 10.3390/ijms252413250.


Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.

McDaniel J, Patel V, Olson N, He H, He Z, Cole K bioRxiv. 2024; .

PMID: 39345378 PMC: 11429686. DOI: 10.1101/2024.09.18.613544.


Epigenomic, transcriptomic and proteomic characterizations of reference samples.

Nepal C, Chen W, Chen Z, Wrobel J, Xie L, Liao W bioRxiv. 2024; .

PMID: 39314461 PMC: 11419083. DOI: 10.1101/2024.09.09.612110.


VCF observer: a user-friendly software tool for preliminary VCF file analysis and comparison.

Emul A, Ergun M, Erturk R, Cinal O, Baysan M BMC Bioinformatics. 2024; 25(1):290.

PMID: 39227760 PMC: 11373448. DOI: 10.1186/s12859-024-05860-0.


References
1.
Larson D, Harris C, Chen K, Koboldt D, Abbott T, Dooling D . SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2011; 28(3):311-7. PMC: 3268238. DOI: 10.1093/bioinformatics/btr665. View

2.
Langmead B, Trapnell C, Pop M, Salzberg S . Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10(3):R25. PMC: 2690996. DOI: 10.1186/gb-2009-10-3-r25. View

3.
Xiao W, Ren L, Chen Z, Fang L, Zhao Y, Lack J . Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat Biotechnol. 2021; 39(9):1141-1150. PMC: 8506910. DOI: 10.1038/s41587-021-00994-5. View

4.
Do H, Dobrovic A . Sequence artifacts in DNA from formalin-fixed tissues: causes and strategies for minimization. Clin Chem. 2014; 61(1):64-71. DOI: 10.1373/clinchem.2014.223040. View

5.
Morash M, Mitchell H, Beltran H, Elemento O, Pathak J . The Role of Next-Generation Sequencing in Precision Medicine: A Review of Outcomes in Oncology. J Pers Med. 2018; 8(3). PMC: 6164147. DOI: 10.3390/jpm8030030. View