» Articles » PMID: 31888438

Alignment-free Genomic Sequence Comparison Using FCGR and Signal Processing

Overview
Publisher Biomed Central
Specialty Biology
Date 2020 Jan 1
PMID 31888438
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Alignment-free methods of genomic comparison offer the possibility of scaling to large data sets of nucleotide sequences comprised of several thousand or more base pairs. Such methods can be used for purposes of deducing "nearby" species in a reference data set, or for constructing phylogenetic trees.

Results: We describe one such method that gives quite strong results. We use the Frequency Chaos Game Representation (FCGR) to create images from such sequences, We then reduce dimension, first using a Fourier trig transform, followed by a Singular Values Decomposition (SVD). This gives vectors of modest length. These in turn are used for fast sequence lookup, construction of phylogenetic trees, and classification of virus genomic data. We illustrate the accuracy and scalability of this approach on several benchmark test sets.

Conclusions: The tandem of FCGR and dimension reductions using Fourier-type transforms and SVD provides a powerful approach for alignment-free genomic comparison. Results compare favorably and often surpass best results reported in prior literature. Good scalability is also observed.

Citing Articles

Overview and Prospects of DNA Sequence Visualization.

Wu Y, Xie X, Zhu J, Guan L, Li M Int J Mol Sci. 2025; 26(2).

PMID: 39859192 PMC: 11764684. DOI: 10.3390/ijms26020477.


CGRclust: Chaos Game Representation for twin contrastive clustering of unlabelled DNA sequences.

Alipour F, Hill K, Kari L BMC Genomics. 2024; 25(1):1214.

PMID: 39695938 PMC: 11657719. DOI: 10.1186/s12864-024-11135-y.


Exploring geometry of genome space via Grassmann manifolds.

Li X, Zhou T, Feng X, Yau S, Yau S Innovation (Camb). 2024; 5(5):100677.

PMID: 39206218 PMC: 11350263. DOI: 10.1016/j.xinn.2024.100677.


PC-mer: An Ultra-fast memory-efficient tool for metagenomics profiling and classification.

Akbari Rokn Abadi S, Mohammadi A, Koohi S PLoS One. 2024; 19(8):e0307279.

PMID: 39088438 PMC: 11293629. DOI: 10.1371/journal.pone.0307279.


Genome analysis through image processing with deep learning models.

Zhang Y, Imoto S J Hum Genet. 2024; 69(10):519-525.

PMID: 39085457 PMC: 11422167. DOI: 10.1038/s10038-024-01275-0.


References
1.
Pei S, Dong R, He R, Yau S . Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector. Comput Struct Biotechnol J. 2019; 17:982-994. PMC: 6661692. DOI: 10.1016/j.csbj.2019.07.003. View

2.
Kubicova V, Provaznik I . Use of whole genome DNA spectrograms in bacterial classification. Comput Biol Med. 2015; 69:298-307. DOI: 10.1016/j.compbiomed.2015.04.038. View

3.
Zielezinski A, Vinga S, Almeida J, Karlowski W . Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol. 2017; 18(1):186. PMC: 5627421. DOI: 10.1186/s13059-017-1319-7. View

4.
Jeffrey H . Chaos game representation of gene structure. Nucleic Acids Res. 1990; 18(8):2163-70. PMC: 330698. DOI: 10.1093/nar/18.8.2163. View

5.
Sievers F, Wilm A, Dineen D, Gibson T, Karplus K, Li W . Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011; 7:539. PMC: 3261699. DOI: 10.1038/msb.2011.75. View