» Articles » PMID: 10366660

Automated Genome Sequence Analysis and Annotation

Overview
Journal Bioinformatics
Specialty Biology
Date 1999 Jun 15
PMID 10366660
Citations 57
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming.

Results: We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner.

Availability: The GeneQuiz system is publicly available for analysis of protein sequences through a Web server at http://www.sander.ebi.ac. uk/gqsrv/submit

Citing Articles

Potential biomarkers in from different hosts and geographical locations.

Ziaullah M S, Kamal M, Warsi M, Alghamdi S, Al Qahtani M, Al Rumaihi A Bioinformation. 2023; 19(5):611-622.

PMID: 37886150 PMC: 10599671. DOI: 10.6026/97320630019611.


Predicting DNA-binding specificities of eukaryotic transcription factors.

Schroder A, Eichner J, Supper J, Eichner J, Wanke D, Henneges C PLoS One. 2010; 5(11):e13876.

PMID: 21152420 PMC: 2994704. DOI: 10.1371/journal.pone.0013876.


Pathway analysis software: annotation errors and solutions.

Henderson-MacLennan N, Papp J, Talbot Jr C, McCabe E, Presson A Mol Genet Metab. 2010; 101(2-3):134-40.

PMID: 20663702 PMC: 2950253. DOI: 10.1016/j.ymgme.2010.06.005.


Analysis of Transcripts Expressed in One-Day-Old Larvae and Fifth Instar Silk Glands of Tasar Silkworm, Antheraea mylitta.

Maity S, Goel S, Roy S, Ghorai S, Bhattacharyya S, Venugopalan A Comp Funct Genomics. 2010; :246738.

PMID: 20454581 PMC: 2864506. DOI: 10.1155/2010/246738.


Automatic policing of biochemical annotations using genomic correlations.

Hsiao T, Revelles O, Chen L, Sauer U, Vitkup D Nat Chem Biol. 2009; 6(1):34-40.

PMID: 19935659 PMC: 2935526. DOI: 10.1038/nchembio.266.