» Articles » PMID: 28769883

A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

Overview
Journal Front Microbiol
Specialty Microbiology
Date 2017 Aug 4
PMID 28769883
Citations 17
Authors
Affiliations
Soon will be listed here.
Abstract

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

Citing Articles

Chromosome-level genome assembly of Phortica okadai, a vector of Thelazia callipaeda.

Wang L, Yu H, Luo B, Yan R, Zhou J, Liu H Sci Data. 2024; 11(1):1370.

PMID: 39695142 PMC: 11655863. DOI: 10.1038/s41597-024-04239-3.


A Genomics-Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in the Potential Novel Strain sp. 21So2-11 Isolated from Antarctic Soil.

Du Y, Han W, Hao P, Hu Y, Hu T, Zeng Y Microorganisms. 2024; 12(6).

PMID: 38930610 PMC: 11205464. DOI: 10.3390/microorganisms12061228.


Genotype diversity of brucellosis agents isolated from humans and animals in Greece based on whole-genome sequencing.

Brangsch H, Sandalakis V, Babetsa M, Boukouvala E, Ntoula A, Makridaki E BMC Infect Dis. 2023; 23(1):529.

PMID: 37580676 PMC: 10426126. DOI: 10.1186/s12879-023-08518-z.


The role of plasmids in carbapenem resistant E. coli in Alameda County, California.

Walas N, Slown S, Amato H, Lloyd T, Bender M, Varghese V BMC Microbiol. 2023; 23(1):147.

PMID: 37217873 PMC: 10201492. DOI: 10.1186/s12866-023-02900-2.


Advances in experimental and computational methodologies for the study of microbial-surface interactions at different omics levels.

Gonzalez-Plaza J, Furlan C, Rijavec T, Lapanje A, Barros R, Tamayo-Ramos J Front Microbiol. 2022; 13:1006946.

PMID: 36519168 PMC: 9744117. DOI: 10.3389/fmicb.2022.1006946.


References
1.
Coil D, Jospin G, Darling A . A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data. Bioinformatics. 2014; 31(4):587-9. DOI: 10.1093/bioinformatics/btu661. View

2.
Brown S, Nagaraju S, Utturkar S, Tissera S, Segovia S, Mitchell W . Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia. Biotechnol Biofuels. 2014; 7:40. PMC: 4022347. DOI: 10.1186/1754-6834-7-40. View

3.
Utturkar S, Klingeman D, Bruno-Barcena J, Chinn M, Grunden A, Kopke M . Sequence data for Clostridium autoethanogenum using three generations of sequencing technologies. Sci Data. 2015; 2:150014. PMC: 4409012. DOI: 10.1038/sdata.2015.14. View

4.
Satou K, Shiroma A, Teruya K, Shimoji M, Nakano K, Juan A . Complete Genome Sequences of Eight Helicobacter pylori Strains with Different Virulence Factor Genotypes and Methylation Profiles, Isolated from Patients with Diverse Gastrointestinal Diseases on Okinawa Island, Japan, Determined Using PacBio.... Genome Announc. 2014; 2(2). PMC: 3990747. DOI: 10.1128/genomeA.00286-14. View

5.
Kanda K, Nakashima K, Nagano Y . Complete Genome Sequence of Bacillus thuringiensis Serovar Tolworthi Strain Pasteur Institute Standard. Genome Announc. 2015; 3(4). PMC: 4490846. DOI: 10.1128/genomeA.00710-15. View