» Articles » PMID: 32742035

Variant Analysis of SARS-CoV-2 Genomes

Overview
Specialty Public Health
Date 2020 Aug 4
PMID 32742035
Citations 311
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: To analyse genome variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2).

Methods: Between 1 February and 1 May 2020, we downloaded 10 022 SARS CoV-2 genomes from four databases. The genomes were from infected patients in 68 countries. We identified variants by extracting pairwise alignment to the reference genome NC_045512, using the EMBOSS needle. Nucleotide variants in the coding regions were converted to corresponding encoded amino acid residues. For clade analysis, we used the open source software Bayesian evolutionary analysis by sampling trees, version 2.5.

Findings: We identified 5775 distinct genome variants, including 2969 missense mutations, 1965 synonymous mutations, 484 mutations in the non-coding regions, 142 non-coding deletions, 100 in-frame deletions, 66 non-coding insertions, 36 stop-gained variants, 11 frameshift deletions and two in-frame insertions. The most common variants were the synonymous 3037C > T (6334 samples), P4715L in the open reading frame 1ab (6319 samples) and D614G in the spike protein (6294 samples). We identified six major clades, (that is, basal, D614G, L84S, L3606F, D448del and G392D) and 14 subclades. Regarding the base changes, the C > T mutation was the most common with 1670 distinct variants.

Conclusion: We found that several variants of the SARS-CoV-2 genome exist and that the D614G clade has become the most common variant since December 2019. The evolutionary analysis indicated structured transmission, with the possibility of multiple introductions into the population.

Citing Articles

Evaluation of Genomic Surveillance of SARS-CoV-2 Virus Isolates and Comparison of Mutational Spectrum of Variants in Bangladesh.

Sultana A, Banu L, Hossain M, Azmin N, Nila N, Sinha S Viruses. 2025; 17(2).

PMID: 40006937 PMC: 11860708. DOI: 10.3390/v17020182.


Model predicted human mobility explains COVID-19 transmission in urban space without behavioral data.

Han Z, Xu F, Li Y, Jiang T, Evans J Sci Rep. 2025; 15(1):6365.

PMID: 39984518 PMC: 11845774. DOI: 10.1038/s41598-025-87363-3.


mRNA Vaccines Against COVID-19 as Trailblazers for Other Human Infectious Diseases.

Brandi R, Paganelli A, DAmelio R, Giuliani P, Lista F, Salemi S Vaccines (Basel). 2025; 12(12.

PMID: 39772079 PMC: 11680146. DOI: 10.3390/vaccines12121418.


Visual codon: a user-friendly Python program for viewing and optimizing gene GC content.

Lin S, Xu F, Huang B, Zhao L, Pan D, Lin S PeerJ. 2024; 12:e18755.

PMID: 39717051 PMC: 11665431. DOI: 10.7717/peerj.18755.


DiMA: sequence diversity dynamics analyser for viruses.

Tharanga S, Unlu E, Hu Y, Sjaugi M, Celik M, Hekimoglu H Brief Bioinform. 2024; 26(1).

PMID: 39592151 PMC: 11596295. DOI: 10.1093/bib/bbae607.


References
1.
Wu Z, McGoogan J . Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention. JAMA. 2020; 323(13):1239-1242. DOI: 10.1001/jama.2020.2648. View

2.
Lu R, Zhao X, Li J, Niu P, Yang B, Wu H . Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020; 395(10224):565-574. PMC: 7159086. DOI: 10.1016/S0140-6736(20)30251-8. View

3.
Richardson S, Hirsch J, Narasimhan M, Crawford J, McGinn T, Davidson K . Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. JAMA. 2020; 323(20):2052-2059. PMC: 7177629. DOI: 10.1001/jama.2020.6775. View

4.
Virtanen P, Gommers R, Oliphant T, Haberland M, Reddy T, Cournapeau D . SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020; 17(3):261-272. PMC: 7056644. DOI: 10.1038/s41592-019-0686-2. View

5.
NEEDLEMAN S, Wunsch C . A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970; 48(3):443-53. DOI: 10.1016/0022-2836(70)90057-4. View