» Articles » PMID: 34224879

Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants

Overview
Specialty Biology
Date 2021 Jul 5
PMID 34224879
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.

Citing Articles

Comprehensive evaluation and guidance of structural variation detection tools in chicken whole genome sequence data.

Ma C, Shi X, Li X, Zhang Y, Peng M BMC Genomics. 2024; 25(1):970.

PMID: 39415108 PMC: 11481438. DOI: 10.1186/s12864-024-10875-1.


Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders.

Zhou B, Arthur J, Guo H, Kim T, Huang Y, Pattni R Cell. 2024; 187(23):6687-6706.e25.

PMID: 39353437 PMC: 11608572. DOI: 10.1016/j.cell.2024.09.014.


Pindel-TD: A Tandem Duplication Detector Based on A Pattern Growth Approach.

Yang X, Zheng G, Jia P, Wang S, Ye K Genomics Proteomics Bioinformatics. 2024; 22(1).

PMID: 38862430 PMC: 11425056. DOI: 10.1093/gpbjnl/qzae008.


SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads.

Denti L, Khorsand P, Bonizzoni P, Hormozdiari F, Chikhi R Nat Methods. 2022; 20(4):550-558.

PMID: 36550274 DOI: 10.1038/s41592-022-01674-1.


Population-scale genotyping of structural variation in the era of long-read sequencing.

Quan C, Lu H, Lu Y, Zhou G Comput Struct Biotechnol J. 2022; 20:2639-2647.

PMID: 35685364 PMC: 9163579. DOI: 10.1016/j.csbj.2022.05.047.


References
1.
Rausch T, Zichner T, Schlattl A, Stutz A, Benes V, Korbel J . DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012; 28(18):i333-i339. PMC: 3436805. DOI: 10.1093/bioinformatics/bts378. View

2.
Li H, Durbin R . Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25(14):1754-60. PMC: 2705234. DOI: 10.1093/bioinformatics/btp324. View

3.
Gao R, Davis A, McDonald T, Sei E, Shi X, Wang Y . Punctuated copy number evolution and clonal stasis in triple-negative breast cancer. Nat Genet. 2016; 48(10):1119-30. PMC: 5042845. DOI: 10.1038/ng.3641. View

4.
Li H, Homer N . A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform. 2010; 11(5):473-83. PMC: 2943993. DOI: 10.1093/bib/bbq015. View

5.
Soylev A, Le T, Amini H, Alkan C, Hormozdiari F . Discovery of tandem and interspersed segmental duplications using high-throughput sequencing. Bioinformatics. 2019; 35(20):3923-3930. PMC: 6792081. DOI: 10.1093/bioinformatics/btz237. View