» Articles » PMID: 38867727

How Much Metagenome Data is Needed for Protein Structure Prediction: The Advantages of Targeted Approach from the Ecological and Evolutionary Perspectives

Overview
Journal Imeta
Specialty Biology
Date 2024 Jun 13
PMID 38867727
Authors
Affiliations
Soon will be listed here.
Abstract

It has been proven that three-dimensional protein structures could be modeled by supplementing homologous sequences with metagenome sequences. Even though a large volume of metagenome data is utilized for such purposes, a significant proportion of proteins remain unsolved. In this review, we focus on identifying ecological and evolutionary patterns in metagenome data, decoding the complicated relationships of these patterns with protein structures, and investigating how these patterns can be effectively used to improve protein structure prediction. First, we proposed the metagenome utilization efficiency and marginal effect model to quantify the divergent distribution of homologous sequences for the protein family. Second, we proposed that the targeted approach effectively identifies homologous sequences from specified biomes compared with the untargeted approach's blind search. Finally, we determined the lower bound for metagenome data required for predicting all the protein structures in the Pfam database and showed that the present metagenome data is insufficient for this purpose. In summary, we discovered ecological and evolutionary patterns in the metagenome data that may be used to predict protein structures effectively. The targeted approach is promising in terms of effectively extracting homologous sequences and predicting protein structures using these patterns.

Citing Articles

Leveraging computer-aided design and artificial intelligence to develop a next-generation multi-epitope tuberculosis vaccine candidate.

Zhuang L, Ali A, Yang L, Ye Z, Li L, Ni R Infect Med (Beijing). 2024; 3(4):100148.

PMID: 39687693 PMC: 11647498. DOI: 10.1016/j.imj.2024.100148.


MicroEXPERT: Microbiome profiling platform with cross-study metagenome-wide association analysis functionality.

Yang P, Yang J, Long H, Huang K, Ji L, Lin H Imeta. 2024; 2(4):e131.

PMID: 38868224 PMC: 10989818. DOI: 10.1002/imt2.131.


iMeta: Integrated meta-omics for biology and environments.

Liu Y, Chen T, Li D, Fu J, Liu S Imeta. 2024; 1(1):e15.

PMID: 38867730 PMC: 10989748. DOI: 10.1002/imt2.15.


How much metagenome data is needed for protein structure prediction: The advantages of targeted approach from the ecological and evolutionary perspectives.

Yang P, Ning K Imeta. 2024; 1(1):e9.

PMID: 38867727 PMC: 10989767. DOI: 10.1002/imt2.9.

References
1.
Ivankov D, Finkelstein A . Solution of Levinthal's Paradox and a Physical Theory of Protein Folding Times. Biomolecules. 2020; 10(2). PMC: 7072185. DOI: 10.3390/biom10020250. View

2.
Kauffmann C, Kazimierczuk K, Schwarz T, Konrat R, Zawadzka-Kazimierczuk A . A novel high-dimensional NMR experiment for resolving protein backbone dihedral angle ambiguities. J Biomol NMR. 2020; 74(4-5):257-265. PMC: 7211790. DOI: 10.1007/s10858-020-00308-y. View

3.
Chen I, Markowitz V, Chu K, Palaniappan K, Szeto E, Pillay M . IMG/M: integrated genome and metagenome comparative data analysis system. Nucleic Acids Res. 2016; 45(D1):D507-D516. PMC: 5210632. DOI: 10.1093/nar/gkw929. View

4.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O . Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596(7873):583-589. PMC: 8371605. DOI: 10.1038/s41586-021-03819-2. View

5.
Coulombe J, Moodie E, Platt R . Estimating the marginal effect of a continuous exposure on an ordinal outcome using data subject to covariate-driven treatment and visit processes. Stat Med. 2021; 40(26):5746-5764. DOI: 10.1002/sim.9151. View