Evaluation of Human-readable Annotation in Biomolecular Sequence Databases with Biological Rule Libraries
Overview
Authors
Affiliations
Motivation: Computer-based selection of entries from sequence databases with respect to a related functional description, e.g. with respect to a common cellular localization or contributing to the same phenotypic function, is a difficult task. Automatic semantic analysis of annotations is not only hampered by incomplete functional assignments. A major problem is that annotations are written in a rich, non-formalized language and are meant for reading by a human expert. This person can extract from the text considerably more information than is immediately apparent due to his extended biological background knowledge and logical reasoning.
Approach: A technique of automated annotation evaluation based on a combination of lexical analysis and the usage of biological rule libraries has been developed. The proposed algorithm generates new functional descriptors from the annotation of a given entry using the semantic units of the annotation as prepositions for implications executed in accordance with the rule library.
Results: The prototype of a software system, the Meta_A(nnotator) program, is described and the results of its application to sequence attribute assignment and sequence selection problems, such as cellular localization and sequence domain annotation of SWISS-PROT entries, are presented. The current software version assigns useful subcellular localization qualifiers to approximately 88% of all SWISS-PROT entries. As shown by demonstrative examples, the combination of sequence and annotation analysis is a powerful approach for the detection of mutual annotation/sequence inconsistencies.
Availability: Results for the cellular localization assignment can be viewed at the URL http://www.bork. embl-heidelberg.de/CELL_LOC/CELL_LOC.html.
Did the early full genome sequencing of yeast boost gene function discovery?.
Tantoso E, Eisenhaber B, Sinha S, Jensen L, Eisenhaber F Biol Direct. 2023; 18(1):46.
PMID: 37574542 PMC: 10424406. DOI: 10.1186/s13062-023-00403-8.
Tantoso E, Eisenhaber B, Sinha S, Jensen L, Eisenhaber F Biol Direct. 2023; 18(1):7.
PMID: 36855185 PMC: 9976479. DOI: 10.1186/s13062-023-00362-0.
Sinha S, Eisenhaber B, Jensen L, Kalbuaji B, Eisenhaber F Proteomics. 2018; 18(21-22):e1800093.
PMID: 30265449 PMC: 6282819. DOI: 10.1002/pmic.201800093.
Bell M, Collison M, Lord P PLoS One. 2013; 8(10):e75541.
PMID: 24143170 PMC: 3797126. DOI: 10.1371/journal.pone.0075541.
Amplification of the Gene Ontology annotation of Affymetrix probe sets.
Muro E, Perez-Iratxeta C, Andrade-Navarro M BMC Bioinformatics. 2006; 7:159.
PMID: 16549014 PMC: 1435773. DOI: 10.1186/1471-2105-7-159.