» Articles » PMID: 30367805

A Content-Based Retrieval Framework for Whole Metagenome Sequencing Samples

Overview
Specialty Biology
Date 2018 Oct 28
PMID 30367805
Authors
Affiliations
Soon will be listed here.
Abstract

Finding similarities and differences between metagenomic samples within large repositories has been rather a significant issue for researchers. Over the recent years, content-based retrieval has been suggested by various studies from different perspectives. In this study, a content-based retrieval framework for identifying relevant metagenomic samples is developed. The framework consists of feature extraction, selection methods and similarity measures for whole metagenome sequencing samples. Performance of the developed framework was evaluated on given samples. A ground truth was used to evaluate the system performance such that if the system retrieves patients with the same disease, -called positive samples-, they are labeled as relevant samples otherwise irrelevant. The experimental results show that relevant experiments can be detected by using different fingerprinting approaches. We observed that Latent Semantic Analysis (LSA) Method is a promising fingerprinting approach for representing metagenomic samples and finding relevance among them. Source codes and executable files are available at www.baskent.edu.tr/∼hogul/WMS_retrieval.rar.

References
1.
Liu Z, Hsiao W, Cantarel B, Drabek E, Fraser-Liggett C . Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics. 2011; 27(23):3242-9. PMC: 3223360. DOI: 10.1093/bioinformatics/btr547. View

2.
White J, Nagarajan N, Pop M . Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009; 5(4):e1000352. PMC: 2661018. DOI: 10.1371/journal.pcbi.1000352. View

3.
Parks D, Beiko R . Identifying biologically relevant differences between metagenomic communities. Bioinformatics. 2010; 26(6):715-21. DOI: 10.1093/bioinformatics/btq041. View

4.
Sievers F, Wilm A, Dineen D, Gibson T, Karplus K, Li W . Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011; 7:539. PMC: 3261699. DOI: 10.1038/msb.2011.75. View

5.
Seth S, Valimaki N, Kaski S, Honkela A . Exploration and retrieval of whole-metagenome sequencing samples. Bioinformatics. 2014; 30(17):2471-9. PMC: 4230234. DOI: 10.1093/bioinformatics/btu340. View