» Articles » PMID: 39160480

GCphase: an SNP Phasing Method Using a Graph Partition and Error Correction Algorithm

Overview
Publisher Biomed Central
Specialty Biology
Date 2024 Aug 19
PMID 39160480
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The utilization of long reads for single nucleotide polymorphism (SNP) phasing has become popular, providing substantial support for research on human diseases and genetic studies in animals and plants. However, due to the complexity of the linkage relationships between SNP loci and sequencing errors in the reads, the recent methods still cannot yield satisfactory results.

Results: In this study, we present a graph-based algorithm, GCphase, which utilizes the minimum cut algorithm to perform phasing. First, based on alignment between long reads and the reference genome, GCphase filters out ambiguous SNP sites and useless read information. Second, GCphase constructs a graph in which a vertex represents alleles of an SNP locus and each edge represents the presence of read support; moreover, GCphase adopts a graph minimum-cut algorithm to phase the SNPs. Next, GCpahse uses two error correction steps to refine the phasing results obtained from the previous step, effectively reducing the error rate. Finally, GCphase obtains the phase block. GCphase was compared to three other methods, WhatsHap, HapCUT2, and LongPhase, on the Nanopore and PacBio long-read datasets. The code is available from https://github.com/baimawjy/GCphase .

Conclusions: Experimental results show that GCphase under different sequencing depths of different data has the least number of switch errors and the highest accuracy compared with other methods.

Citing Articles

Genomic resources, opportunities, and prospects for accelerated improvement of millets.

Kasule F, Diack O, Mbaye M, Kakeeto R, Econopouly B Theor Appl Genet. 2024; 137(12):273.

PMID: 39565376 PMC: 11579216. DOI: 10.1007/s00122-024-04777-9.

References
1.
Majidian S, Kahaei M, de Ridder D . Hap10: reconstructing accurate and long polyploid haplotypes using linked reads. BMC Bioinformatics. 2020; 21(1):253. PMC: 7302376. DOI: 10.1186/s12859-020-03584-5. View

2.
Edge P, Bafna V, Bansal V . HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 2016; 27(5):801-812. PMC: 5411775. DOI: 10.1101/gr.213462.116. View

3.
Wu J, Chen X, Li X . Haplotyping a single triploid individual based on genetic algorithm. Biomed Mater Eng. 2014; 24(6):3753-62. DOI: 10.3233/BME-141204. View

4.
Garg S . Computational methods for chromosome-scale haplotype reconstruction. Genome Biol. 2021; 22(1):101. PMC: 8040228. DOI: 10.1186/s13059-021-02328-9. View

5.
He D, Saha S, Finkers R, Parida L . Efficient algorithms for polyploid haplotype phasing. BMC Genomics. 2018; 19(Suppl 2):110. PMC: 5954289. DOI: 10.1186/s12864-018-4464-9. View