» Articles » PMID: 36453992

Pushing the Limits of HiFi Assemblies Reveals Centromere Diversity Between Two Arabidopsis Thaliana Genomes

Overview
Specialty Biochemistry
Date 2022 Dec 1
PMID 36453992
Authors
Affiliations
Soon will be listed here.
Abstract

Although long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0.

Citing Articles

TIPPo: A User-Friendly Tool for De Novo Assembly of Organellar Genomes with High-Fidelity Data.

Xian W, Bezrukov I, Bao Z, Vorbrugg S, Gautam A, Weigel D Mol Biol Evol. 2025; 42(1).

PMID: 39800935 PMC: 11725521. DOI: 10.1093/molbev/msae247.


A chromosome-level genome assembly of a model conifer plant, the Japanese cedar, Cryptomeria japonica D. Don.

Fujino T, Yamaguchi K, Yokoyama T, Hamanaka T, Harazono Y, Kamada H BMC Genomics. 2024; 25(1):1039.

PMID: 39501145 PMC: 11539532. DOI: 10.1186/s12864-024-10929-4.


Atlas of telomeric repeat diversity in Arabidopsis thaliana.

Tao Y, Xian W, Bao Z, Rabanal F, Movilli A, Lanz C Genome Biol. 2024; 25(1):244.

PMID: 39285474 PMC: 11406999. DOI: 10.1186/s13059-024-03388-3.


Diploid genome assembly of the Malbec grapevine cultivar enables haplotype-aware analysis of transcriptomic differences underlying clonal phenotypic variation.

Calderon L, Carbonell-Bejerano P, Munoz C, Bree L, Sola C, Bergamin D Hortic Res. 2024; 11(5):uhae080.

PMID: 38766532 PMC: 11101320. DOI: 10.1093/hr/uhae080.


A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range.

Lian Q, Huettel B, Walkemeier B, Mayjonade B, Lopez-Roques C, Gil L Nat Genet. 2024; 56(5):982-991.

PMID: 38605175 PMC: 11096106. DOI: 10.1038/s41588-024-01715-9.


References
1.
Deschamps S, Zhang Y, Llaca V, Ye L, Sanyal A, King M . A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping. Nat Commun. 2018; 9(1):4844. PMC: 6242865. DOI: 10.1038/s41467-018-07271-1. View

2.
Naish M, Alonge M, Wlodzimierz P, Tock A, Abramson B, Schmucker A . The genetic and epigenetic landscape of the centromeres. Science. 2021; 374(6569):eabi7489. PMC: 10164409. DOI: 10.1126/science.abi7489. View

3.
Barragan A, Collenberg M, Wang J, Lee R, Cher W, Rabanal F . A Truncated Singleton NLR Causes Hybrid Necrosis in Arabidopsis thaliana. Mol Biol Evol. 2020; 38(2):557-574. PMC: 7826191. DOI: 10.1093/molbev/msaa245. View

4.
Kim K, Peluso P, Babayan P, Yeadon P, Yu C, Fisher W . Long-read, whole-genome shotgun sequence data for five model organisms. Sci Data. 2015; 1:140045. PMC: 4365909. DOI: 10.1038/sdata.2014.45. View

5.
Wang B, Yang X, Jia Y, Xu Y, Jia P, Dang N . High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads. Genomics Proteomics Bioinformatics. 2021; 20(1):4-13. PMC: 9510872. DOI: 10.1016/j.gpb.2021.08.003. View