» Articles » PMID: 38340093

On the Core Segmentation Algorithms of Copy Number Variation Detection Tools

Overview
Journal Brief Bioinform
Specialty Biology
Date 2024 Feb 10
PMID 38340093
Authors
Affiliations
Soon will be listed here.
Abstract

Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority of mainstream tools employ similar detection rationale: calculates the so-called read depth signal from aligned sequencing reads and then segments the signal by utilizing either circular binary segmentation (CBS) or hidden Markov model (HMM). Hence, we compared the performance of those two core segmentation algorithms in CNV detection, considering varying sequencing depths, segment lengths and complex types of CNVs. To ensure a fair comparison, we designed a parametrical model using mainstream statistical distributions, which allows for pre-excluding bias correction such as guanine-cytosine (GC) content during the preprocessing step. The results indicate the following key points: (1) Under ideal conditions, CBS demonstrates high precision, while HMM exhibits a high recall rate. (2) For practical conditions, HMM is advantageous at lower sequencing depths, while CBS is more competitive in detecting small variant segments compared to HMM. (3) In case involving complex CNVs resembling real sequencing, HMM demonstrates more robustness compared with CBS. (4) When facing large-scale sequencing data, HMM costs less time compared with the CBS, while their memory usage is approximately equal. This can provide an important guidance and reference for researchers to develop new tools for CNV detection.

Citing Articles

HapCNV: A Comprehensive Framework for CNV Detection in Low-input DNA Sequencing Data.

Yu X, Qin F, Liu S, Brown N, Lu Q, Cai G bioRxiv. 2025; .

PMID: 39763944 PMC: 11702719. DOI: 10.1101/2024.12.19.629494.


Copy Number Variation in Asthma: An Integrative Review.

Garcia F, de Sousa V, Silva-Dos-Santos P, Fernandes I, Sarquis Serpa F, Paula F Clin Rev Allergy Immunol. 2025; 68(1):4.

PMID: 39755867 DOI: 10.1007/s12016-024-09015-0.


LoRA-TV: read depth profile-based clustering of tumor cells in single-cell sequencing.

Duan J, Zhao X, Wu X Brief Bioinform. 2024; 25(4).

PMID: 38877886 PMC: 11179121. DOI: 10.1093/bib/bbae277.

References
1.
Bentley D, Balasubramanian S, Swerdlow H, Smith G, Milton J, Brown C . Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008; 456(7218):53-9. PMC: 2581791. DOI: 10.1038/nature07517. View

2.
Pan B, Kusko R, Xiao W, Zheng Y, Liu Z, Xiao C . Similarities and differences between variants called with human reference genome HG19 or HG38. BMC Bioinformatics. 2019; 20(Suppl 2):101. PMC: 6419332. DOI: 10.1186/s12859-019-2620-0. View

3.
Venkatraman E, Olshen A . A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics. 2007; 23(6):657-63. DOI: 10.1093/bioinformatics/btl646. View

4.
Onsongo G, Baughn L, Bower M, Henzler C, Schomaker M, Silverstein K . CNV-RF Is a Random Forest-Based Copy Number Variation Detection Method Using Next-Generation Sequencing. J Mol Diagn. 2016; 18(6):872-881. DOI: 10.1016/j.jmoldx.2016.07.001. View

5.
Lima L, Wang K . PennCNV in whole-genome sequencing data. BMC Bioinformatics. 2017; 18(Suppl 11):383. PMC: 5629549. DOI: 10.1186/s12859-017-1802-x. View