» Articles » PMID: 36959975

Exploring Microbial Functional Biodiversity at the Protein Family Level-From Metagenomic Sequence Reads to Annotated Protein Clusters

Abstract

Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.

Citing Articles

Visualizing metagenomic and metatranscriptomic data: A comprehensive review.

Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou N, Kokoli M, Panagiotopoulou E Comput Struct Biotechnol J. 2024; 23:2011-2033.

PMID: 38765606 PMC: 11101950. DOI: 10.1016/j.csbj.2024.04.060.


Unraveling the functional dark matter through global metagenomics.

Pavlopoulos G, Baltoumas F, Liu S, Selvitopi O, Camargo A, Nayfach S Nature. 2023; 622(7983):594-602.

PMID: 37821698 PMC: 10584684. DOI: 10.1038/s41586-023-06583-7.

References
1.
Dress A, Flamm C, Fritzsch G, Grunewald S, Kruspe M, Prohaska S . Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol Biol. 2008; 3:7. PMC: 2464588. DOI: 10.1186/1748-7188-3-7. View

2.
Li W, Godzik A . Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006; 22(13):1658-9. DOI: 10.1093/bioinformatics/btl158. View

3.
Tyner C, Barber G, Casper J, Clawson H, Diekhans M, Eisenhart C . The UCSC Genome Browser database: 2017 update. Nucleic Acids Res. 2016; 45(D1):D626-D634. PMC: 5210591. DOI: 10.1093/nar/gkw1134. View

4.
Lin H, Liao Y . Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep. 2016; 6:24175. PMC: 4828714. DOI: 10.1038/srep24175. View

5.
Talavera G, Castresana J . Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 2007; 56(4):564-77. DOI: 10.1080/10635150701472164. View