» Articles » PMID: 16076884

JIGSAW: Integration of Multiple Sources of Evidence for Gene Prediction

Overview
Journal Bioinformatics
Specialty Biology
Date 2005 Aug 4
PMID 16076884
Citations 65
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Computational gene finding systems play an important role in finding new human genes, although no systems are yet accurate enough to predict all or even most protein-coding regions perfectly. Ab initio programs can be augmented by evidence such as expression data or protein sequence homology, which improves their performance. The amount of such evidence continues to grow, but computational methods continue to have difficulty predicting genes when the evidence is conflicting or incomplete. Genome annotation pipelines collect a variety of types of evidence about gene structure and synthesize the results, which can then be refined further through manual, expert curation of gene models.

Results: JIGSAW is a new gene finding system designed to automate the process of predicting gene structure from multiple sources of evidence, with results that often match the performance of human curators. JIGSAW computes the relative weight of different lines of evidence using statistics generated from a training set, and then combines the evidence using dynamic programming. Our results show that JIGSAW's performance is superior to ab initio gene finding methods and to other pipelines such as Ensembl. Even without evidence from alignment to known genes, JIGSAW can substantially improve gene prediction accuracy as compared with existing methods.

Availability: JIGSAW is available as an open source software package at http://cbcb.umd.edu/software/jigsaw.

Citing Articles

GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.

Bruna T, Lomsadze A, Borodovsky M Genome Res. 2024; 34(5):757-768.

PMID: 38866548 PMC: 11216313. DOI: 10.1101/gr.278373.123.


Boosting grapevine breeding for climate-smart viticulture: from genetic resources to predictive genomics.

Magon G, de Rosa V, Martina M, Falchi R, Acquadro A, Barcaccia G Front Plant Sci. 2023; 14:1293186.

PMID: 38148866 PMC: 10750425. DOI: 10.3389/fpls.2023.1293186.


An improved reference of the grapevine genome reasserts the origin of the PN40024 highly homozygous genotype.

Velt A, Frommer B, Blanc S, Holtgrawe D, Duchene E, Dumas V G3 (Bethesda). 2023; 13(5).

PMID: 36966465 PMC: 10151409. DOI: 10.1093/g3journal/jkad067.


A new gene finding tool GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.

Bruna T, Lomsadze A, Borodovsky M bioRxiv. 2023; .

PMID: 36711453 PMC: 9882169. DOI: 10.1101/2023.01.13.524024.


TSEBRA: transcript selector for BRAKER.

Gabriel L, Hoff K, Bruna T, Borodovsky M, Stanke M BMC Bioinformatics. 2021; 22(1):566.

PMID: 34823473 PMC: 8620231. DOI: 10.1186/s12859-021-04482-0.