» Articles » PMID: 22556368

SINA: Accurate High-throughput Multiple Sequence Alignment of Ribosomal RNA Genes

Overview
Journal Bioinformatics
Specialty Biology
Date 2012 May 5
PMID 22556368
Citations 1199
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements.

Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks.

Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.

Citing Articles

Comparison of naturalization mouse model setups uncover distinct effects on intestinal mucosa depending on microbial experience.

Arnesen H, Birkeland S, Stendahl H, Neuhaus K, Masopust D, Boysen P Discov Immunol. 2025; 4(1):kyaf002.

PMID: 40065807 PMC: 11892432. DOI: 10.1093/discim/kyaf002.


Robust phylogenetic tree-based microbiome association test using repeatedly measured data for composition bias.

Kim K, Won S BMC Bioinformatics. 2025; 26(1):75.

PMID: 40050732 PMC: 11887327. DOI: 10.1186/s12859-024-06002-2.


What defines a photosynthetic microbial mat in western Antarctica?.

Mercado-Juarez R, Valdespino-Castillo P, Merino Ibarra M, Batista S, Mac Cormack W, Ruberto L PLoS One. 2025; 20(3):e0315919.

PMID: 40043057 PMC: 11882083. DOI: 10.1371/journal.pone.0315919.


A Novel Splice Variant Confers Susceptibility to Otitis Media in Humans.

Elling C, Ryan A, Yarza T, Ghaffar A, Llanes E, Kofonow J Int J Mol Sci. 2025; 26(4).

PMID: 40003878 PMC: 11855725. DOI: 10.3390/ijms26041411.


Ammonifying and phosphorus-solubilizing function of sp. nov. isolated from bloom and algal-bacterial interactions.

Li F, Xu M, Pan L, Li J, Lan C, Li Z Front Microbiol. 2025; 16:1516993.

PMID: 39996082 PMC: 11849500. DOI: 10.3389/fmicb.2025.1516993.


References
1.
Wang L, Jiang T . On the complexity of multiple sequence alignment. J Comput Biol. 1994; 1(4):337-48. DOI: 10.1089/cmb.1994.1.337. View

2.
NEEDLEMAN S, Wunsch C . A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970; 48(3):443-53. DOI: 10.1016/0022-2836(70)90057-4. View

3.
Kemena C, Notredame C . Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics. 2009; 25(19):2455-65. PMC: 2752613. DOI: 10.1093/bioinformatics/btp452. View

4.
Thompson J, Plewniak F, Poch O . BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics. 1999; 15(1):87-8. DOI: 10.1093/bioinformatics/15.1.87. View

5.
Wilm A, Mainz I, Steger G . An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol. 2006; 1:19. PMC: 1635699. DOI: 10.1186/1748-7188-1-19. View