Accurate Circular Consensus Long-read Sequencing Improves Variant Detection and Assembly of a Human Genome
Overview
Authors
Affiliations
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
Neetu , Pal S, Subramanian S, Ramya T Biofilm. 2025; 9:100265.
PMID: 40066315 PMC: 11891150. DOI: 10.1016/j.bioflm.2025.100265.
Cao J, Tong Y, Xiao Z, Chen H, Liu Z Sci Data. 2025; 12(1):411.
PMID: 40064893 PMC: 11894119. DOI: 10.1038/s41597-025-04752-z.
GoldPolish-target: targeted long-read genome assembly polishing.
Zhang E, Coombe L, Wong J, Warren R, Birol I BMC Bioinformatics. 2025; 26(1):78.
PMID: 40055584 PMC: 11887200. DOI: 10.1186/s12859-025-06091-7.
Yang S, Wang X, Wang Z, Zhang W, Su H, Wei X Hortic Res. 2025; 12(3):uhae338.
PMID: 40046320 PMC: 11879649. DOI: 10.1093/hr/uhae338.
Impacts of ribosomal RNA sequence variation on gene expression and phenotype.
Welfer G, Brady R, Natchiar S, Watson Z, Rundlet E, Alejo J Philos Trans R Soc Lond B Biol Sci. 2025; 380(1921):20230379.
PMID: 40045785 PMC: 11883441. DOI: 10.1098/rstb.2023.0379.