» Articles » PMID: 22702410

Bios2mds: an R Package for Comparing Orthologous Protein Families by Metric Multidimensional Scaling

Overview
Publisher Biomed Central
Specialty Biology
Date 2012 Jun 19
PMID 22702410
Citations 25
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scaling (MDS). Applied to protein families, MDS provides information complementary to the information derived from tree-based methods. Moreover, MDS gives a unique opportunity to compare orthologous sequence sets because it can add supplementary elements to a reference space.

Results: The R package bios2mds (from BIOlogical Sequences to MultiDimensional Scaling) has been designed to analyze multiple sequence alignments by MDS. Bios2mds starts with a sequence alignment, builds a matrix of distances between the aligned sequences, and represents this matrix by MDS to visualize a sequence space. This package also offers the possibility of performing K-means clustering in the MDS derived sequence space. Most importantly, bios2mds includes a function that projects supplementary elements (a.k.a. "out of sample" elements) onto the space defined by reference or "active" elements. Orthologous sequence sets can thus be compared in a straightforward way. The data analysis and visualization tools have been specifically designed for an easy monitoring of the evolutionary drift of protein sub-families.

Conclusions: The bios2mds package provides the tools for a complete integrated pipeline aimed at the MDS analysis of multiple sets of orthologous sequences in the R statistical environment. In addition, as the analysis can be carried out from user provided matrices, the projection function can be widely used on any kind of data.

Citing Articles

Diversity, abundance, and domain architecture of plant NLR proteins in .

Negi V, Srinivasan R, Dutta B Heliyon. 2025; 10(14):e34475.

PMID: 39816363 PMC: 11734081. DOI: 10.1016/j.heliyon.2024.e34475.


Genotypic Clustering of H5N1 Avian Influenza Viruses in North America Evaluated by Ordination Analysis.

Tawidian P, Torchetti M, Killian M, Lantz K, Dilione K, Ringenberg J Viruses. 2025; 16(12.

PMID: 39772128 PMC: 11680268. DOI: 10.3390/v16121818.


Genomic variation in Plasmodium relictum (lineage SGS1) and its implications for avian malaria infection outcomes: insights from experimental infections and genome-wide analysis.

Kalbskopf V, Azelyte J, Palinauskas V, Hellgren O Malar J. 2024; 23(1):260.

PMID: 39210339 PMC: 11360878. DOI: 10.1186/s12936-024-05061-3.


DrivR-Base: a feature extraction toolkit for variant effect prediction model construction.

Francis A, Campbell C, Gaunt T Bioinformatics. 2024; 40(4).

PMID: 38603611 PMC: 11057939. DOI: 10.1093/bioinformatics/btae197.


In-Host HEV Quasispecies Evolution Shows the Limits of Mutagenic Antiviral Treatments.

Colomer-Castell S, Gregori J, Garcia-Cehic D, Riveiro-Barciela M, Buti M, Rando-Segura A Int J Mol Sci. 2023; 24(24).

PMID: 38139013 PMC: 10743355. DOI: 10.3390/ijms242417185.


References
1.
Higgins D . Sequence ordinations: a multivariate analysis approach to analysing large sequence data sets. Comput Appl Biosci. 1992; 8(1):15-22. DOI: 10.1093/bioinformatics/8.1.15. View

2.
Woolley K, Athalye M . A use for principal coordinate analysis in the comparison of protein sequences. Biochem Biophys Res Commun. 1986; 140(3):808-13. DOI: 10.1016/0006-291x(86)90705-9. View

3.
Pele J, Abdi H, Moreau M, Thybert D, Chabbert M . Multidimensional scaling reveals the main evolutionary pathways of class A G-protein-coupled receptors. PLoS One. 2011; 6(4):e19094. PMC: 3081337. DOI: 10.1371/journal.pone.0019094. View

4.
Culhane A, Thioulouse J, Perriere G, Higgins D . MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics. 2005; 21(11):2789-90. DOI: 10.1093/bioinformatics/bti394. View

5.
Deville J, Rey J, Chabbert M . An indel in transmembrane helix 2 helps to trace the molecular evolution of class A G-protein-coupled receptors. J Mol Evol. 2009; 68(5):475-89. DOI: 10.1007/s00239-009-9214-9. View