» Articles » PMID: 31406327

Accurate Circular Consensus Long-read Sequencing Improves Variant Detection and Assembly of a Human Genome

Abstract

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

Citing Articles

alginate lyase and Psl glycoside hydrolase inhibit biofilm formation by CF2843 on three-dimensional aggregates of lung epithelial cells.

Neetu , Pal S, Subramanian S, Ramya T Biofilm. 2025; 9:100265.

PMID: 40066315 PMC: 11891150. DOI: 10.1016/j.bioflm.2025.100265.


Chromosome-level genome assembly of Jaguar guapote (Parachromis manguensis) by massive parallel sequencing.

Cao J, Tong Y, Xiao Z, Chen H, Liu Z Sci Data. 2025; 12(1):411.

PMID: 40064893 PMC: 11894119. DOI: 10.1038/s41597-025-04752-z.


GoldPolish-target: targeted long-read genome assembly polishing.

Zhang E, Coombe L, Wong J, Warren R, Birol I BMC Bioinformatics. 2025; 26(1):78.

PMID: 40055584 PMC: 11887200. DOI: 10.1186/s12859-025-06091-7.


A chromosome-level reference genome facilitates the discovery of clubroot-resistant gene in Chinese cabbage.

Yang S, Wang X, Wang Z, Zhang W, Su H, Wei X Hortic Res. 2025; 12(3):uhae338.

PMID: 40046320 PMC: 11879649. DOI: 10.1093/hr/uhae338.


Impacts of ribosomal RNA sequence variation on gene expression and phenotype.

Welfer G, Brady R, Natchiar S, Watson Z, Rundlet E, Alejo J Philos Trans R Soc Lond B Biol Sci. 2025; 380(1921):20230379.

PMID: 40045785 PMC: 11883441. DOI: 10.1098/rstb.2023.0379.


References
1.
Rausch T, Zichner T, Schlattl A, Stutz A, Benes V, Korbel J . DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012; 28(18):i333-i339. PMC: 3436805. DOI: 10.1093/bioinformatics/bts378. View

2.
Sedlazeck F, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A . Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods. 2018; 15(6):461-468. PMC: 5990442. DOI: 10.1038/s41592-018-0001-7. View

3.
Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Kallberg M . Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2015; 32(8):1220-2. DOI: 10.1093/bioinformatics/btv710. View

4.
Cretu Stancu M, van Roosmalen M, Renkens I, Nieboer M, Middelkamp S, de Ligt J . Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat Commun. 2017; 8(1):1326. PMC: 5673902. DOI: 10.1038/s41467-017-01343-4. View

5.
Bentley D, Balasubramanian S, Swerdlow H, Smith G, Milton J, Brown C . Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008; 456(7218):53-9. PMC: 2581791. DOI: 10.1038/nature07517. View