» Articles » PMID: 36239149

Ploidyfrost: Reference-free Estimation of Ploidy Level from Whole Genome Sequencing Data Based on De Bruijn Graphs

Overview
Authors
Affiliations
Soon will be listed here.
Abstract

Polyploidy is ubiquitous and its consequences are complex and variable. A change of ploidy level generally influences genetic diversity and results in morphological, physiological and ecological differences between cells or organisms with different ploidy levels. To avoid cumbersome experiments and take advantage of the less biased information provided by the vast amounts of genome sequencing data, computational tools for ploidy estimation are urgently needed. Until now, although a few such tools have been developed, many aspects of this estimation, such as the requirement of a reference genome, the lack of informative results and objective inferences, and the influence of false positives from errors and repeats, need further improvement. We have developed ploidyfrost, a de Bruijn graph-based method, to estimate ploidy levels from whole genome sequencing data sets without a reference genome. ploidyfrost provides a visual representation of allele frequency distribution generated using the ggplot2 package as well as quantitative results using the Gaussian mixture model. In addition, it takes advantage of colouring information encoded in coloured de Bruijn graphs to analyse multiple samples simultaneously and to flexibly filter putative false positives. We evaluated the performance of ploidyfrost by analysing highly heterozygous or repetitive samples of Cyclocarya paliurus and a complex allooctoploid sample of Fragaria × ananassa. Moreover, we demonstrated that the accuracy of analysis results can be improved by constraining a threshold such as Cramér's V coefficient on variant features, which may significantly reduce the side effects of sequencing errors and annoying repeats on the graphical structure constructed.

Citing Articles

Variant calling in polyploids for population and quantitative genetics.

Phillips A Appl Plant Sci. 2024; 12(4):e11607.

PMID: 39184203 PMC: 11342233. DOI: 10.1002/aps3.11607.


nQuack: An R package for predicting ploidal level from sequence data using site-based heterozygosity.

Gaynor M, Landis J, OConnor T, Laport R, Doyle J, Soltis D Appl Plant Sci. 2024; 12(4):e11606.

PMID: 39184199 PMC: 11342224. DOI: 10.1002/aps3.11606.


LocoGSE, a sequence-based genome size estimator for plants.

Guenzi-Tiberi P, Istace B, Alsos I, Coissac E, Lavergne S, Aury J Front Plant Sci. 2024; 15:1328966.

PMID: 38550287 PMC: 10972871. DOI: 10.3389/fpls.2024.1328966.


Development of a risk model to predict prognosis in breast cancer based on cGAS-STING-related genes.

Chen C, Wang J, Dong C, Lim D, Feng Z Front Genet. 2023; 14:1121018.

PMID: 37051596 PMC: 10083333. DOI: 10.3389/fgene.2023.1121018.


ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs.

Sun M, Pang E, Bai W, Zhang D, Lin K Mol Ecol Resour. 2022; 23(2):499-510.

PMID: 36239149 PMC: 10092044. DOI: 10.1111/1755-0998.13720.

References
1.
Kokot M, Dlugosz M, Deorowicz S . KMC 3: counting and manipulating k-mer statistics. Bioinformatics. 2017; 33(17):2759-2761. DOI: 10.1093/bioinformatics/btx304. View

2.
Hirakawa H, Shirasawa K, Kosugi S, Tashiro K, Nakayama S, Yamada M . Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species. DNA Res. 2013; 21(2):169-81. PMC: 3989489. DOI: 10.1093/dnares/dst049. View

3.
Claros M, Bautista R, Guerrero-Fernandez D, Benzerki H, Seoane P, Fernandez-Pozo N . Why assembling plant genome sequences is so challenging. Biology (Basel). 2014; 1(2):439-59. PMC: 4009782. DOI: 10.3390/biology1020439. View

4.
Leggett R, Ramirez-Gonzalez R, Verweij W, Kawashima C, Iqbal Z, Jones J . Identifying and classifying trait linked polymorphisms in non-reference species by walking coloured de bruijn graphs. PLoS One. 2013; 8(3):e60058. PMC: 3607606. DOI: 10.1371/journal.pone.0060058. View

5.
Ranallo-Benavidez T, Jaron K, Schatz M . GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020; 11(1):1432. PMC: 7080791. DOI: 10.1038/s41467-020-14998-3. View