» Articles » PMID: 34042443

A Machine Learning Bioinformatics Method to Predict Biological Activity from Biosynthetic Gene Clusters

Overview
Date 2021 May 27
PMID 34042443
Citations 33
Authors
Affiliations
Soon will be listed here.
Abstract

Research in natural products, the genetically encoded small molecules produced by organisms in an idiosyncratic fashion, deals with molecular structure, biosynthesis, and biological activity. Bioinformatics analyses of microbial genomes can successfully reveal the genetic instructions, biosynthetic gene clusters, that produce many natural products. Genes to molecule predictions made on biosynthetic gene clusters have revealed many important new structures. There is no comparable method for genes to biological activity predictions. To address this missing pathway, we developed a machine learning bioinformatics method for predicting a natural product's antibiotic activity directly from the sequence of its biosynthetic gene cluster. We trained commonly used machine learning classifiers to predict antibacterial or antifungal activity based on features of known natural product biosynthetic gene clusters. We have identified classifiers that can attain accuracies as high as 80% and that have enabled the identification of biosynthetic enzymes and their corresponding molecular features that are associated with antibiotic activity.

Citing Articles

Amphibian skin bacteria contain a wide repertoire of genes linked to their antifungal capacities.

Gonzalez-Serrano F, Romero-Contreras Y, Orta A, Basanta M, Morales H, Sandoval Garcia G World J Microbiol Biotechnol. 2025; 41(3):78.

PMID: 40011297 PMC: 11865118. DOI: 10.1007/s11274-025-04292-z.


Artificial Intelligence in Natural Product Drug Discovery: Current Applications and Future Perspectives.

Gangwal A, Lavecchia A J Med Chem. 2025; 68(4):3948-3969.

PMID: 39916476 PMC: 11874025. DOI: 10.1021/acs.jmedchem.4c01257.


Interpretable adenylation domain specificity prediction using protein language models.

Adduri A, McNutt A, Ellington C, Suraparaju K, Fang N, Yan D bioRxiv. 2025; .

PMID: 39868251 PMC: 11761653. DOI: 10.1101/2025.01.13.632878.


Exploring the Promoter Generation and Prediction of spp. Based on GAN and Multi-Model Fusion Methods.

Zhao C, Guan Y, Yan S, Li J Int J Mol Sci. 2024; 25(23).

PMID: 39684846 PMC: 11642183. DOI: 10.3390/ijms252313137.


TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting.

Zhao C, Yan S, Li J Int J Mol Sci. 2024; 25(22).

PMID: 39595936 PMC: 11594102. DOI: 10.3390/ijms252211866.


References
1.
Khaldi N, Seifuddin F, Turner G, Haft D, Nierman W, Wolfe K . SMURF: Genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol. 2010; 47(9):736-41. PMC: 2916752. DOI: 10.1016/j.fgb.2010.06.003. View

2.
Kautsar S, Blin K, Shaw S, Navarro-Munoz J, Terlouw B, van der Hooft J . MIBiG 2.0: a repository for biosynthetic gene clusters of known function. Nucleic Acids Res. 2019; 48(D1):D454-D458. PMC: 7145714. DOI: 10.1093/nar/gkz882. View

3.
Hanchen A, Rausch S, Landmann B, Toti L, Nusser A, Sussmuth R . Alanine scan of the peptide antibiotic feglymycin: assessment of amino acid side chains contributing to antimicrobial activity. Chembiochem. 2013; 14(5):625-32. DOI: 10.1002/cbic.201300032. View

4.
Umezawa K, Nakazawa K, Ikeda Y, Naganawa H, Kondo S . Polyoxypeptins A and B Produced by Streptomyces: Apoptosis-Inducing Cyclic Depsipeptides Containing the Novel Amino Acid (2S,3R)-3-Hydroxy-3-methylproline. J Org Chem. 2001; 64(9):3034-3038. DOI: 10.1021/jo981512n. View

5.
Butcher R, Schroeder F, Fischbach M, Straight P, Kolter R, Walsh C . The identification of bacillaene, the product of the PksX megacomplex in Bacillus subtilis. Proc Natl Acad Sci U S A. 2007; 104(5):1506-9. PMC: 1785240. DOI: 10.1073/pnas.0610503104. View