» Articles » PMID: 29806194

No Wisdom in the Crowd: Genome Annotation in the Era of Big Data - Current Status and Future Prospects

Overview
Date 2018 May 29
PMID 29806194
Citations 32
Authors
Affiliations
Soon will be listed here.
Abstract

Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from 'the sequence tells the structure tells the function' fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader.

Citing Articles

Combining DNA and protein alignments to improve genome annotation with LiftOn.

Chao K, Heinz J, Hoh C, Mao A, Shumate A, Pertea M Genome Res. 2024; 35(2):311-325.

PMID: 39730188 PMC: 11874971. DOI: 10.1101/gr.279620.124.


Artificial intelligence-based prediction of pathogen emergence and evolution in the world of synthetic biology.

Danchin A Microb Biotechnol. 2024; 17(10):e70014.

PMID: 39364593 PMC: 11450380. DOI: 10.1111/1751-7915.70014.


Multi-omics analysis reveals genes and metabolites involved in Streptococcus suis biofilm formation.

Wang H, Fan Q, Wang Y, Yi L, Wang Y BMC Microbiol. 2024; 24(1):297.

PMID: 39127666 PMC: 11316374. DOI: 10.1186/s12866-024-03448-5.


Decoding microbial genomes to understand their functional roles in human complex diseases.

Wang Y, Dong Q, Hu S, Zou H, Wu T, Shi J Imeta. 2024; 1(2):e14.

PMID: 38868571 PMC: 10989872. DOI: 10.1002/imt2.14.


Combining DNA and protein alignments to improve genome annotation with LiftOn.

Chao K, Heinz J, Hoh C, Mao A, Shumate A, Pertea M bioRxiv. 2024; .

PMID: 38798552 PMC: 11118573. DOI: 10.1101/2024.05.16.593026.


References
1.
Medigue C, Viari A, Henaut A, Danchin A . Escherichia coli molecular genetic map (1500 kbp): update II. Mol Microbiol. 1991; 5(11):2629-40. DOI: 10.1111/j.1365-2958.1991.tb01972.x. View

2.
Danchin A, Ouzounis C, Tokuyasu T, Zucker J . No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects. Microb Biotechnol. 2018; 11(4):588-605. PMC: 6011933. DOI: 10.1111/1751-7915.13284. View

3.
Prelec D, Seung H, McCoy J . A solution to the single-question crowd wisdom problem. Nature. 2017; 541(7638):532-535. DOI: 10.1038/nature21054. View

4.
Bianchi V, Ceol A, Ogier A, de Pretis S, Galeota E, Kishore K . Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions. Front Genet. 2016; 7:75. PMC: 4858535. DOI: 10.3389/fgene.2016.00075. View

5.
Medigue C, Bouche J, Henaut A, Danchin A . Mapping of sequenced genes (700 kbp) in the restriction map of the Escherichia coli chromosome. Mol Microbiol. 1990; 4(2):169-87. DOI: 10.1111/j.1365-2958.1990.tb00585.x. View