» Articles » PMID: 28348851

: Rapid Efficient Extraction of SNPs from Multi-FASTA Alignments

Overview
Journal Microb Genom
Specialties Genetics
Microbiology
Date 2017 Mar 29
PMID 28348851
Citations 680
Authors
Affiliations
Soon will be listed here.
Abstract

Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1842 taxa, 22 618 sites) in 267 seconds using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers. It is easy to install through the Debian and Homebrew package managers, and has been successfully tested on more than 20 operating systems. is implemented in C and is available under the open source license GNU GPL version 3.

Citing Articles

Integrative and conjugative elements associated with antimicrobial resistance in multidrug resistant isolates from bovine respiratory disease (BRD)-affected animals in Spanish feedlots.

Serna C, Calderon Bernal J, Torre-Fuentes L, Garcia Munoz A, Diez Guerrier A, Hernandez M Vet Q. 2025; 45(1):1-15.

PMID: 40055923 PMC: 11892046. DOI: 10.1080/01652176.2025.2474220.


In-Depth Genome-Based Analysis of Shigella spp. and Escherichia spp.: Resolving Ambiguities and Unveiling Phylogenetic Relationships.

Dif G, Djemouai N, Bouras N, Zitouni A Curr Microbiol. 2025; 82(4):170.

PMID: 40045049 DOI: 10.1007/s00284-025-04158-5.


Genomic and pathogenicity analyses to identify the causative agent from multiple serogroups of non-O1, non-O139 in foodborne outbreaks.

Morita M, Hiyoshi H, Arakawa E, Izumiya H, Ohnishi M, Ogata K Microb Genom. 2025; 11(2).

PMID: 40009544 PMC: 11865499. DOI: 10.1099/mgen.0.001364.


Probiogenomic analysis of SD7, a probiotic candidate with remarkable aggregation abilities.

Yaikhan T, Wonglapsuwan M, Pahumunto N, Nokchan N, Teanpaisan R, Surachat K Heliyon. 2025; 11(3):e42451.

PMID: 40007772 PMC: 11850171. DOI: 10.1016/j.heliyon.2025.e42451.


Antimicrobial Resistance in Isolates from Bovine Mastitis Can Be Associated with Multidrug-Resistance-Mediating Integrative and Conjugative Elements (ICEs).

Jahnen J, Hanke D, Kadlec K, Schwarz S, Kruger-Haker H Antibiotics (Basel). 2025; 14(2).

PMID: 40001397 PMC: 11851858. DOI: 10.3390/antibiotics14020153.


References
1.
Thompson J, Gibson T, Higgins D . Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics. 2008; Chapter 2:Unit 2.3. DOI: 10.1002/0471250953.bi0203s00. View

2.
Capella-Gutierrez S, Silla-Martinez J, Gabaldon T . trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009; 25(15):1972-3. PMC: 2712344. DOI: 10.1093/bioinformatics/btp348. View

3.
Price M, Dehal P, Arkin A . FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010; 5(3):e9490. PMC: 2835736. DOI: 10.1371/journal.pone.0009490. View

4.
Katoh K, Standley D . MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013; 30(4):772-80. PMC: 3603318. DOI: 10.1093/molbev/mst010. View

5.
Lischer H, Excoffier L . PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics. 2011; 28(2):298-9. DOI: 10.1093/bioinformatics/btr642. View