» Articles » PMID: 18841204

MetaSim: a Sequencing Simulator for Genomics and Metagenomics

Overview
Journal PLoS One
Date 2008 Oct 9
PMID 18841204
Citations 198
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing a rapid increase of environmental data in public databases. There is great need for specialized software solutions and statistical methods for dealing with complex metagenome data sets.

Methodology/principal Findings: To facilitate the development and improvement of metagenomic tools and the planning of metagenomic projects, we introduce a sequencing simulator called MetaSim. Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree.

Conclusions/significance: MetaSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.

Citing Articles

Computational Study Protocol: Leveraging Synthetic Data to Validate a Benchmark Study for Differential Abundance Tests for 16S Microbiome Sequencing Data.

Kohnert E, Kreutz C F1000Res. 2025; 13:1180.

PMID: 39866725 PMC: 11757917. DOI: 10.12688/f1000research.155230.2.


Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in prokaryotic metagenome-assembled genome recovery.

Rocha U, Kasmanas J, Toscan R, Sanches D, Magnusdottir S, Saraiva J PLoS Comput Biol. 2024; 20(10):e1012530.

PMID: 39436938 PMC: 11530072. DOI: 10.1371/journal.pcbi.1012530.


Simulated High Throughput Sequencing Datasets: A Crucial Tool for Validating Bioinformatic Pathogen Detection Pipelines.

Espindola A Biology (Basel). 2024; 13(9).

PMID: 39336128 PMC: 11428249. DOI: 10.3390/biology13090700.


IPEV: identification of prokaryotic and eukaryotic virus-derived sequences in virome using deep learning.

Yin H, Wu S, Tan J, Guo Q, Li M, Guo J Gigascience. 2024; 13.

PMID: 38649300 PMC: 11034026. DOI: 10.1093/gigascience/giae018.


Boquila: NGS read simulator to eliminate read nucleotide bias in sequence analysis.

Akkose U, Adebali O Turk J Biol. 2023; 47(2):158-163.

PMID: 37529166 PMC: 10387831. DOI: 10.55730/1300-0152.2650.


References
1.
Bentley D . Whole-genome re-sequencing. Curr Opin Genet Dev. 2006; 16(6):545-52. DOI: 10.1016/j.gde.2006.10.009. View

2.
Bernal A, Ear U, Kyrpides N . Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 2000; 29(1):126-7. PMC: 29859. DOI: 10.1093/nar/29.1.126. View

3.
McHardy A, Garcia Martin H, Tsirigos A, Hugenholtz P, Rigoutsos I . Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2006; 4(1):63-72. DOI: 10.1038/nmeth976. View

4.
Huson D, Auch A, Qi J, Schuster S . MEGAN analysis of metagenomic data. Genome Res. 2007; 17(3):377-86. PMC: 1800929. DOI: 10.1101/gr.5969107. View

5.
Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy A . Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 2007; 4(6):495-500. DOI: 10.1038/nmeth1043. View