» Articles » PMID: 38567138

A Comprehensive Evaluation of the Potential of Three Next-generation Short-read-based Plant Pan-genome Construction Strategies for the Identification of Novel Non-reference Sequence

Overview
Journal Front Plant Sci
Date 2024 Apr 3
PMID 38567138
Authors
Affiliations
Soon will be listed here.
Abstract

Pan-genome studies are important for understanding plant evolution and guiding the breeding of crops by containing all genomic diversity of a certain species. Three short-read-based strategies for plant pan-genome construction include iterative individual, iteration pooling, and map-to-pan. Their performance is very different under various conditions, while comprehensive evaluations have yet to be conducted nowadays. Here, we evaluate the performance of these three pan-genome construction strategies for plants under different sequencing depths and sample sizes. Also, we indicate the influence of length and repeat content percentage of novel sequences on three pan-genome construction strategies. Besides, we compare the computational resource consumption among the three strategies. Our findings indicate that map-to-pan has the greatest recall but the lowest precision. In contrast, both two iterative strategies have superior precision but lower recall. Factors of sample numbers, novel sequence length, and the percentage of novel sequences' repeat content adversely affect the performance of all three strategies. Increased sequencing depth improves map-to-pan's performance, while not affecting the other two iterative strategies. For computational resource consumption, map-to-pan demands considerably more than the other two iterative strategies. Overall, the iterative strategy, especially the iterative pooling strategy, is optimal when the sequencing depth is less than 20X. Map-to-pan is preferable when the sequencing depth exceeds 20X despite its higher computational resource consumption.

References
1.
Fu L, Niu B, Zhu Z, Wu S, Li W . CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150-2. PMC: 3516142. DOI: 10.1093/bioinformatics/bts565. View

2.
Kurtz S, Phillippy A, Delcher A, Smoot M, Shumway M, Antonescu C . Versatile and open software for comparing large genomes. Genome Biol. 2004; 5(2):R12. PMC: 395750. DOI: 10.1186/gb-2004-5-2-r12. View

3.
Ou L, Li D, Lv J, Chen W, Zhang Z, Li X . Pan-genome of cultivated pepper (Capsicum) and its use in gene presence-absence variation analyses. New Phytol. 2018; 220(2):360-363. DOI: 10.1111/nph.15413. View

4.
Tao Y, Luo H, Xu J, Cruickshank A, Zhao X, Teng F . Extensive variation within the pan-genome of cultivated and wild sorghum. Nat Plants. 2021; 7(6):766-773. DOI: 10.1038/s41477-021-00925-x. View

5.
Hurgobin B, Golicz A, Bayer P, Chan C, Tirnaz S, Dolatabadian A . Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus. Plant Biotechnol J. 2017; 16(7):1265-1274. PMC: 5999312. DOI: 10.1111/pbi.12867. View