» Articles » PMID: 33127969

Improving Read Alignment Through the Generation of Alternative Reference Via Iterative Strategy

Overview
Journal Sci Rep
Specialty Science
Date 2020 Oct 31
PMID 33127969
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

There is generally one standard reference sequence for each species. When extensive variations exist in other breeds of the species, it can lead to ambiguous alignment and inaccurate variant calling and, in turn, compromise the accuracy of downstream analysis. Here, with the help of the FPGA hardware platform, we present a method that generates an alternative reference via an iterative strategy to improve the read alignment for breeds that are genetically distant to the reference breed. Compared to the published reference genomes, by using the alternative reference sequences we built, the mapping rates of Chinese indigenous pigs and chickens were improved by 0.61-1.68% and 0.09-0.45%, respectively. These sequences also enable researchers to recover highly variable regions that could be missed using public reference sequences. We also determined that the optimal number of iterations needed to generate alternative reference sequences were seven and five for pigs and chickens, respectively. Our results show that, for genetically distant breeds, generating an alternative reference sequence can facilitate read alignment and variant calling and improve the accuracy of downstream analyses.

Citing Articles

Comparative population genomics reveals convergent and divergent selection in the apricot-peach-plum-mei complex.

Yang X, Su Y, Huang S, Hou Q, Wei P, Hao Y Hortic Res. 2024; 11(6):uhae109.

PMID: 38883333 PMC: 11179850. DOI: 10.1093/hr/uhae109.


Characterization of complex structural variation in the gene loci using single-molecule long-read sequencing.

Turner A, Derezinski A, Gaedigk A, Berres M, Gregornik D, Brown K Front Pharmacol. 2023; 14:1195778.

PMID: 37426826 PMC: 10324673. DOI: 10.3389/fphar.2023.1195778.


Whole-genome analysis reveals the hybrid formation of Chinese indigenous DHB pig following human migration.

Wang Y, Zhang C, Peng Y, Cai X, Hu X, Bosse M Evol Appl. 2022; 15(3):501-514.

PMID: 35386394 PMC: 8965386. DOI: 10.1111/eva.13366.

References
1.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A . The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20(9):1297-303. PMC: 2928508. DOI: 10.1101/gr.107524.110. View

2.
Danecek P, McCarthy S . BCFtools/csq: haplotype-aware variant consequences. Bioinformatics. 2017; 33(13):2037-2039. PMC: 5870570. DOI: 10.1093/bioinformatics/btx100. View

3.
Yang R, Fang S, Wang J, Zhang C, Zhang R, Liu D . Genome-wide analysis of structural variants reveals genetic differences in Chinese pigs. PLoS One. 2017; 12(10):e0186721. PMC: 5655481. DOI: 10.1371/journal.pone.0186721. View

4.
Fumihito A, Miyake T, Takada M, Shingu R, Endo T, Gojobori T . Monophyletic origin and unique dispersal patterns of domestic fowls. Proc Natl Acad Sci U S A. 1996; 93(13):6792-5. PMC: 39106. DOI: 10.1073/pnas.93.13.6792. View

5.
Okumura K, Kato M, Kirikae T, Kayano M, Miyoshi-Akiyama T . Construction of a virtual Mycobacterium tuberculosis consensus genome and its application to data from a next generation sequencer. BMC Genomics. 2015; 16:218. PMC: 4425900. DOI: 10.1186/s12864-015-1368-9. View