» Articles » PMID: 22384018

Evaluating Characteristics of De Novo Assembly Software on 454 Transcriptome Data: a Simulation Approach

Overview
Journal PLoS One
Date 2012 Mar 3
PMID 22384018
Citations 48
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The quantity of transcriptome data is rapidly increasing for non-model organisms. As sequencing technology advances, focus shifts towards solving bioinformatic challenges, of which sequence read assembly is the first task. Recent studies have compared the performance of different software to establish a best practice for transcriptome assembly. Here, we adapted a simulation approach to evaluate specific features of assembly programs on 454 data. The novelty of our study is that the simulation allows us to calculate a model assembly as reference point for comparison.

Findings: The simulation approach allows us to compare basic metrics of assemblies computed by different software applications (CAP3, MIRA, Newbler, and Oases) to a known optimal solution. We found MIRA and CAP3 are conservative in merging reads. This resulted in comparably high number of short contigs. In contrast, Newbler more readily merged reads into longer contigs, while Oases produced the overall shortest assembly. Due to the simulation approach, reads could be traced back to their correct placement within the transcriptome. Together with mapping reads onto the assembled contigs, we were able to evaluate ambiguity in the assemblies. This analysis further supported the conservative nature of MIRA and CAP3, which resulted in low proportions of chimeric contigs, but high redundancy. Newbler produced less redundancy, but the proportion of chimeric contigs was higher.

Conclusion: Our evaluation of four assemblers suggested that MIRA and Newbler slightly outperformed the other programs, while showing contrasting characteristics. Oases did not perform very well on the 454 reads. Our evaluation indicated that the software was either conservative (MIRA) or liberal (Newbler) about merging reads into contigs. This suggested that in choosing an assembly program researchers should carefully consider their follow up analysis and consequences of the chosen approach to gain an assembly.

Citing Articles

Development of transcriptome assembly and SSRs in allohexaploid Brassica with functional annotations and identification of heat-shock proteins for thermotolerance.

Singh K, Kumari P, Yadava D Front Genet. 2022; 13:958217.

PMID: 36186472 PMC: 9524822. DOI: 10.3389/fgene.2022.958217.


Raw transcriptomics data to gene specific SSRs: a validated free bioinformatics workflow for biologists.

Naranpanawa D, Chandrasekara C, Bandaranayake P, Bandaranayake A Sci Rep. 2020; 10(1):18236.

PMID: 33106560 PMC: 7588437. DOI: 10.1038/s41598-020-75270-8.


Distinct Gut Virome Profile of Pregnant Women With Type 1 Diabetes in the ENDIA Study.

Kim K, Allen D, Briese T, Couper J, Barry S, Colman P Open Forum Infect Dis. 2019; 6(2):ofz025.

PMID: 30815502 PMC: 6386807. DOI: 10.1093/ofid/ofz025.


High-Throughput Sequencing to Investigate Phytopathogenic Fungal Propagules Caught in Baited Insect Traps.

Tremblay E, Kimoto T, Berube J, Bilodeau G J Fungi (Basel). 2019; 5(1).

PMID: 30759800 PMC: 6463110. DOI: 10.3390/jof5010015.


De novo transcriptome sequencing and assembly from apomictic and sexual Eragrostis curvula genotypes.

Garbus I, Romero J, Selva J, Pasten M, Chinestra C, Carballo J PLoS One. 2017; 12(11):e0185595.

PMID: 29091722 PMC: 5665505. DOI: 10.1371/journal.pone.0185595.


References
1.
Yang H, Hu L, Hurek T, Reinhold-Hurek B . Global characterization of the root transcriptome of a wild species of rice, Oryza longistaminata, by deep sequencing. BMC Genomics. 2010; 11:705. PMC: 3016420. DOI: 10.1186/1471-2164-11-705. View

2.
Bai X, Mamidala P, Rajarapu S, Jones S, Mittapalli O . Transcriptomics of the bed bug (Cimex lectularius). PLoS One. 2011; 6(1):e16336. PMC: 3023805. DOI: 10.1371/journal.pone.0016336. View

3.
Kumar S, Blaxter M . Comparing de novo assemblers for 454 transcriptome data. BMC Genomics. 2010; 11:571. PMC: 3091720. DOI: 10.1186/1471-2164-11-571. View

4.
Schwartz T, Tae H, Yang Y, Mockaitis K, Van Hemert J, Proulx S . A garter snake transcriptome: pyrosequencing, de novo assembly, and sex-specific differences. BMC Genomics. 2010; 11:694. PMC: 3014983. DOI: 10.1186/1471-2164-11-694. View

5.
Jeukens J, Renaut S, St-Cyr J, Nolte A, Bernatchez L . The transcriptomics of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis spp., Salmonidae) divergence as revealed by next-generation sequencing. Mol Ecol. 2010; 19(24):5389-403. DOI: 10.1111/j.1365-294X.2010.04934.x. View