» Articles » PMID: 32243433

PlasClass Improves Plasmid Sequence Classification

Overview
Specialty Biology
Date 2020 Apr 4
PMID 32243433
Citations 43
Authors
Affiliations
Soon will be listed here.
Abstract

Many bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice. We present PlasClass, a new plasmid classifier. It uses a set of standard classifiers trained on the most current set of known plasmid sequences for different sequence lengths. We tested PlasClass sequence classification on held-out data and simulations, as well as publicly available bacterial isolates and plasmidome samples and plasmids assembled from metagenomic samples. PlasClass outperforms the state-of-the-art plasmid classification tool on shorter sequences, which constitute the majority of assembly contigs, allowing it to achieve higher F1 scores in classifying sequences from a wide range of datasets. PlasClass also uses significantly less time and memory. PlasClass can be used to easily classify plasmid and bacterial genome sequences in metagenomic or isolate assemblies. It is available under the MIT license from: https://github.com/Shamir-Lab/PlasClass.

Citing Articles

Genomic analysis of antimicrobial resistant Escherichia coli isolated from manure and manured agricultural grasslands.

Tyrrell C, Burgess C, Brennan F, Munzenmaier D, Drissner D, Leigh R NPJ Antimicrob Resist. 2025; 3(1):8.

PMID: 39900801 PMC: 11790903. DOI: 10.1038/s44259-025-00081-8.


Modern microbiology: Embracing complexity through integration across scales.

Eren A, Banfield J Cell. 2024; 187(19):5151-5170.

PMID: 39303684 PMC: 11450119. DOI: 10.1016/j.cell.2024.08.028.


4CAC: 4-class classifier of metagenome contigs using machine learning and assembly graphs.

Pu L, Shamir R Nucleic Acids Res. 2024; 52(19):e94.

PMID: 39287139 PMC: 11514454. DOI: 10.1093/nar/gkae799.


MOBFinder: a tool for mobilization typing of plasmid metagenomic fragments based on a language model.

Feng T, Wu S, Zhou H, Fang Z Gigascience. 2024; 13.

PMID: 39101782 PMC: 11299106. DOI: 10.1093/gigascience/giae047.


PlasmidHunter: accurate and fast prediction of plasmid sequences using gene content profile and machine learning.

Tian R, Zhou J, Imanian B Brief Bioinform. 2024; 25(4).

PMID: 38960405 PMC: 11770376. DOI: 10.1093/bib/bbae322.


References
1.
Gourle H, Karlsson-Lindsjo O, Hayer J, Bongcam-Rudloff E . Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics. 2018; 35(3):521-522. PMC: 6361232. DOI: 10.1093/bioinformatics/bty630. View

2.
Rozov R, Kav A, Bogumil D, Shterzer N, Halperin E, Mizrahi I . Recycler: an algorithm for detecting plasmids from de novo assembly graphs. Bioinformatics. 2016; 33(4):475-482. PMC: 5408804. DOI: 10.1093/bioinformatics/btw651. View

3.
Krawczyk P, Lipinski L, Dziembowski A . PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Res. 2018; 46(6):e35. PMC: 5887522. DOI: 10.1093/nar/gkx1321. View

4.
Zhou F, Xu Y . cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data. Bioinformatics. 2010; 26(16):2051-2. PMC: 2916713. DOI: 10.1093/bioinformatics/btq299. View

5.
Bankevich A, Nurk S, Antipov D, Gurevich A, Dvorkin M, Kulikov A . SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012; 19(5):455-77. PMC: 3342519. DOI: 10.1089/cmb.2012.0021. View