Inferring Functional Modules of Protein Families with Probabilistic Topic Models
Overview
Affiliations
Background: Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.
Results: We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.
Conclusions: We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.
An overview of topic modeling and its current applications in bioinformatics.
Liu L, Tang L, Dong W, Yao S, Zhou W Springerplus. 2016; 5(1):1608.
PMID: 27652181 PMC: 5028368. DOI: 10.1186/s40064-016-3252-8.
Understanding Genotype-Phenotype Effects in Cancer via Network Approaches.
Kim Y, Cho D, Przytycka T PLoS Comput Biol. 2016; 12(3):e1004747.
PMID: 26963104 PMC: 4786343. DOI: 10.1371/journal.pcbi.1004747.
Konietzny S, Pope P, Weimann A, McHardy A Biotechnol Biofuels. 2014; 7(1):124.
PMID: 25342967 PMC: 4189754. DOI: 10.1186/s13068-014-0124-8.
Song J, Zhang F, Tang S, Liu X, Gao Y, Lu P Evid Based Complement Alternat Med. 2013; 2013:731370.
PMID: 24376467 PMC: 3860149. DOI: 10.1155/2013/731370.
Metagenomic annotation networks: construction and applications.
Vey G, Moreno-Hagelsieb G PLoS One. 2012; 7(8):e41283.
PMID: 22879885 PMC: 3413691. DOI: 10.1371/journal.pone.0041283.