» Articles » PMID: 35968114

Measuring the Invisible: The Sequences Causal of Genome Size Differences in Eyebrights () Revealed by K-mers

Overview
Journal Front Plant Sci
Date 2022 Aug 15
PMID 35968114
Authors
Affiliations
Soon will be listed here.
Abstract

Genome size variation within plant taxa is due to presence/absence variation, which may affect low-copy sequences or genomic repeats of various frequency classes. However, identifying the sequences underpinning genome size variation is challenging because genome assemblies commonly contain collapsed representations of repetitive sequences and because genome skimming studies by design miss low-copy number sequences. Here, we take a novel approach based on k-mers, short sub-sequences of equal length , generated from whole-genome sequencing data of diploid eyebrights (), a group of plants that have considerable genome size variation within a ploidy level. We compare k-mer inventories within and between closely related species, and quantify the contribution of different copy number classes to genome size differences. We further match high-copy number k-mers to specific repeat types as retrieved from the RepeatExplorer2 pipeline. We find genome size differences of up to 230Mbp, equivalent to more than 20% genome size variation. The largest contributions to these differences come from rDNA sequences, a 145-nt genomic satellite and a repeat associated with an Angela transposable element. We also find size differences in the low-copy number class (copy number ≤ 10×) of up to 27 Mbp, possibly indicating differences in gene space between our samples. We demonstrate that it is possible to pinpoint the sequences causing genome size variation within species without the use of a reference genome. Such sequences can serve as targets for future cytogenetic studies. We also show that studies of genome size variation should go beyond repeats if they aim to characterise the full range of genomic variants. To allow future work with other taxonomic groups, we share our k-mer analysis pipeline, which is straightforward to run, relying largely on standard GNU command line tools.

Citing Articles

-mer approaches for biodiversity genomics.

Jenike K, Campos-Dominguez L, Bodde M, Cerca J, Hodson C, Schatz M Genome Res. 2025; 35(2):219-230.

PMID: 39890468 PMC: 11874746. DOI: 10.1101/gr.279452.124.


nQuack: An R package for predicting ploidal level from sequence data using site-based heterozygosity.

Gaynor M, Landis J, OConnor T, Laport R, Doyle J, Soltis D Appl Plant Sci. 2024; 12(4):e11606.

PMID: 39184199 PMC: 11342224. DOI: 10.1002/aps3.11606.

References
1.
Jin J, Yu W, Yang J, Song Y, dePamphilis C, Yi T . GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020; 21(1):241. PMC: 7488116. DOI: 10.1186/s13059-020-02154-5. View

2.
Novak P, Guignard M, Neumann P, Kelly L, Mlinarec J, Koblizkova A . Repeat-sequence turnover shifts fundamentally in species with large genomes. Nat Plants. 2020; 6(11):1325-1329. DOI: 10.1038/s41477-020-00785-x. View

3.
Novak P, Neumann P, Pech J, Steinhaisl J, Macas J . RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics. 2013; 29(6):792-3. DOI: 10.1093/bioinformatics/btt054. View

4.
Agren J, Greiner S, Johnson M, Wright S . No evidence that sex and transposable elements drive genome size variation in evening primroses. Evolution. 2015; 69(4):1053-62. DOI: 10.1111/evo.12627. View

5.
Vitales D, Alvarez I, Garcia S, Hidalgo O, Nieto Feliner G, Pellicer J . Genome size variation at constant chromosome number is not correlated with repetitive DNA dynamism in Anacyclus (Asteraceae). Ann Bot. 2019; 125(4):611-623. PMC: 7103019. DOI: 10.1093/aob/mcz183. View