» Articles » PMID: 29444236

FindGSE: Estimating Genome Size Variation Within Human and Arabidopsis Using K-mer Frequencies

Overview
Journal Bioinformatics
Specialty Biology
Date 2018 Feb 15
PMID 29444236
Citations 96
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Analyzing k-mer frequencies in whole-genome sequencing data is becoming a common method for estimating genome size (GS). However, it remains uninvestigated how accurate the method is, especially if it can capture intra-species GS variation.

Results: We present findGSE, which fits skew normal distributions to k-mer frequencies to estimate GS. findGSE outperformed existing tools in an extensive simulation study. Estimating GSs of 89 Arabidopsis thaliana accessions, findGSE showed the highest capability in capturing GS variations. In an application with 71 female and 71 male human individuals, findGSE delivered an average of 3039 Mb as haploid human GS, while female genomes were on average 41 Mb larger than male genomes, in astonishing agreement with size difference of the X and Y chromosomes. Further analysis showed that human GS variations link to geographical patterns and significant differences between populations, which can be explained by variable abundances of LINE-1 retrotransposons.

Availability And Implementation: R package of findGSE is freely available at https://github.com/schneebergerlab/findGSE and supported on linux and Mac systems.

Contact: schneeberger@mpipz.mpg.de.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

A Chromosome-level genome assembly of the American bullfrog (Aquarana catesbeiana).

Zhang K, Zhang Y, Tian Y, Xu B, Jiang X, Qin Z Sci Data. 2025; 12(1):413.

PMID: 40064910 PMC: 11893809. DOI: 10.1038/s41597-025-04697-3.


Telomere-to-telomere, gap-free genome of mung bean () provides insights into domestication under structural variation.

Jia K, Li G, Wang L, Liu M, Wang Z, Li R Hortic Res. 2025; 12(3):uhae337.

PMID: 40061812 PMC: 11886820. DOI: 10.1093/hr/uhae337.


-mer approaches for biodiversity genomics.

Jenike K, Campos-Dominguez L, Bodde M, Cerca J, Hodson C, Schatz M Genome Res. 2025; 35(2):219-230.

PMID: 39890468 PMC: 11874746. DOI: 10.1101/gr.279452.124.


Chromosome-level genome assembly of Pontederia cordata L. provides insights into its rapid adaptation and variation of flower colours.

Wang J, Wang J, Zhang W, Zhang W, Yang X, Yang X DNA Res. 2025; 32(2).

PMID: 39878035 PMC: 11879222. DOI: 10.1093/dnares/dsaf002.


A High-Quality Phased Genome Assembly of Stinging Nettle ( ssp. ).

Hirabayashi K, Dumigan C, Kucka M, Percy D, Guerriero G, Cronk Q Plants (Basel). 2025; 14(1.

PMID: 39795384 PMC: 11722821. DOI: 10.3390/plants14010124.