» Articles » PMID: 32003788

LAMPA, LArge Multidomain Protein Annotator, and Its Application to RNA Virus Polyproteins

Overview
Journal Bioinformatics
Specialty Biology
Date 2020 Feb 1
PMID 32003788
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: To facilitate accurate estimation of statistical significance of sequence similarity in profile-profile searches, queries should ideally correspond to protein domains. For multidomain proteins, using domains as queries depends on delineation of domain borders, which may be unknown. Thus, proteins are commonly used as queries that complicate establishing homology for similarities close to cutoff levels of statistical significance.

Results: In this article, we describe an iterative approach, called LAMPA, LArge Multidomain Protein Annotator, that resolves the above conundrum by gradual expansion of hit coverage of multidomain proteins through re-evaluating statistical significance of hit similarity using ever smaller queries defined at each iteration. LAMPA employs TMHMM and HHsearch for recognition of transmembrane regions and homology, respectively. We used Pfam database for annotating 2985 multidomain proteins (polyproteins) composed of >1000 amino acid residues, which dominate proteomes of RNA viruses. Under strict cutoffs, LAMPA outperformed HHsearch-mediated runs using intact polyproteins as queries by three measures: number of and coverage by identified homologous regions, and number of hit Pfam profiles. Compared to HHsearch, LAMPA identified 507 extra homologous regions in 14.4% of polyproteins. This Pfam-based annotation of RNA virus polyproteins by LAMPA was also superior to RefSeq expert annotation by two measures, region number and annotated length, for 69.3% of RNA virus polyprotein entries. We rationalized the obtained results based on dependencies of HHsearch hit statistical significance for local alignment similarity score from lengths and diversities of query-target pairs in computational experiments.

Availability And Implementation: LAMPA 1.0.0 R package is placed at github (https://github.com/Gorbalenya-Lab/LAMPA).

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Infectious bovine rhinotracheitis: Unveiling the hidden threat to livestock productivity and global trade.

Rimayanti R, Khairullah A, Lestari T, Moses I, Utama S, Damayanti R Open Vet J. 2024; 14(10):2525-2538.

PMID: 39545192 PMC: 11560271. DOI: 10.5455/OVJ.2024.v14.i10.3.


VOGDB-Database of Virus Orthologous Groups.

Trgovec-Greif L, Hellinger H, Mainguy J, Pfundner A, Frishman D, Kiening M Viruses. 2024; 16(8).

PMID: 39205165 PMC: 11360334. DOI: 10.3390/v16081191.


Deep mining of the Sequence Read Archive reveals major genetic innovations in coronaviruses and other nidoviruses of aquatic vertebrates.

Lauber C, Zhang X, Vaas J, Klingler F, Mutz P, Dubin A PLoS Pathog. 2024; 20(4):e1012163.

PMID: 38648214 PMC: 11065284. DOI: 10.1371/journal.ppat.1012163.


A second type of N7-guanine RNA cap methyltransferase in an unusual locus of a large RNA virus genome.

Shannon A, Sama B, Gauffre P, Guez T, Debart F, Vasseur J Nucleic Acids Res. 2022; 50(19):11186-11198.

PMID: 36265859 PMC: 9638943. DOI: 10.1093/nar/gkac876.


Opportunities and Challenges of Data-Driven Virus Discovery.

Lauber C, Seitz S Biomolecules. 2022; 12(8).

PMID: 36008967 PMC: 9406072. DOI: 10.3390/biom12081073.


References
1.
El-Gebali S, Mistry J, Bateman A, Eddy S, Luciani A, Potter S . The Pfam protein families database in 2019. Nucleic Acids Res. 2018; 47(D1):D427-D432. PMC: 6324024. DOI: 10.1093/nar/gky995. View

2.
Soding J . Protein homology detection by HMM-HMM comparison. Bioinformatics. 2004; 21(7):951-60. DOI: 10.1093/bioinformatics/bti125. View

3.
Steinegger M, Meier M, Mirdita M, Vohringer H, Haunsberger S, Soding J . HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics. 2019; 20(1):473. PMC: 6744700. DOI: 10.1186/s12859-019-3019-7. View

4.
Gorbalenya A, Snijder E . Viral cysteine proteinases. Perspect Drug Discov Des. 2020; 6(1):64-86. PMC: 7104566. DOI: 10.1007/BF02174046. View

5.
Finn R, Clements J, Eddy S . HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011; 39(Web Server issue):W29-37. PMC: 3125773. DOI: 10.1093/nar/gkr367. View