» Articles » PMID: 12952880

Assessment of Genome-wide Protein Function Classification for Drosophila Melanogaster

Overview
Journal Genome Res
Specialty Genetics
Date 2003 Sep 4
PMID 12952880
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

The functional classification of genes on a genome-wide scale is now in its infancy, and we make a first attempt to assess existing methods and identify sources of error. To this end, we compared two independent efforts for associating proteins with functions, one implemented by FlyBase and the other by PANTHER at Celera Genomics. Both methods make inferences based on sequence similarity and the available experimental evidence. However, they differ considerably in methodology and process. Overall, assuming that the systematic error across the two methods is relatively small, we find the protein-to-function association error rate of both the FlyBase and PANTHER methods to be <2%. The primary source of error for both methods appears to be simple human error. Although homology-based inference can certainly cause errors in annotation, our analysis indicates that the frequency of such errors is relatively small compared with the number of correct inferences. Moreover, these homology errors can be minimized by careful tree-based inference, such as that implemented in PANTHER. Often, functional associations are made by one method and not the other, indicating that one of the greatest challenges lies in improving the completeness of available ontology associations.

Citing Articles

PANTHER: Making genome-scale phylogenetics accessible to all.

Thomas P, Ebert D, Muruganujan A, Mushayahama T, Albou L, Mi H Protein Sci. 2021; 31(1):8-22.

PMID: 34717010 PMC: 8740835. DOI: 10.1002/pro.4218.


Large-scale gene function analysis with the PANTHER classification system.

Mi H, Muruganujan A, Casagrande J, Thomas P Nat Protoc. 2013; 8(8):1551-66.

PMID: 23868073 PMC: 6519453. DOI: 10.1038/nprot.2013.092.


A threading-based method for the prediction of DNA-binding proteins with application to the human genome.

Gao M, Skolnick J PLoS Comput Biol. 2009; 5(11):e1000567.

PMID: 19911048 PMC: 2770119. DOI: 10.1371/journal.pcbi.1000567.


FINDSITE: a combined evolution/structure-based approach to protein function prediction.

Skolnick J, Brylinski M Brief Bioinform. 2009; 10(4):378-91.

PMID: 19324930 PMC: 2691936. DOI: 10.1093/bib/bbp017.


A Drosophila systems model of pentylenetetrazole induced locomotor plasticity responsive to antiepileptic drugs.

Mohammad F, Singh P, Sharma A BMC Syst Biol. 2009; 3:11.

PMID: 19154620 PMC: 2657775. DOI: 10.1186/1752-0509-3-11.


References
1.
Bairoch A, Apweiler R . The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 1999; 28(1):45-8. PMC: 102476. DOI: 10.1093/nar/28.1.45. View

2.
. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 1998; 282(5396):2012-8. DOI: 10.1126/science.282.5396.2012. View

3.
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J . Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25(1):25-9. PMC: 3037419. DOI: 10.1038/75556. View

4.
. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000; 408(6814):796-815. DOI: 10.1038/35048692. View

5.
Venter J, Adams M, Myers E, Li P, Mural R, Sutton G . The sequence of the human genome. Science. 2001; 291(5507):1304-51. DOI: 10.1126/science.1058040. View