» Articles » PMID: 9149143

Prediction of Complete Gene Structures in Human Genomic DNA

Overview
Journal J Mol Biol
Publisher Elsevier
Date 1997 Apr 25
PMID 9149143
Citations 1595
Authors
Affiliations
Soon will be listed here.
Abstract

We introduce a general probabilistic model of the gene structure of human genomic sequences which incorporates descriptions of the basic transcriptional, translational and splicing signals, as well as length distributions and compositional features of exons, introns and intergenic regions. Distinct sets of model parameters are derived to account for the many substantial differences in gene density and structure observed in distinct C + G compositional regions of the human genome. In addition, new models of the donor and acceptor splice signals are described which capture potentially important dependencies between signal positions. The model is applied to the problem of gene identification in a computer program, GENSCAN, which identifies complete exon/intron structures of genes in genomic DNA. Novel features of the program include the capacity to predict multiple genes in a sequence, to deal with partial as well as complete genes, and to predict consistent sets of genes occurring on either or both DNA strands. GENSCAN is shown to have substantially higher accuracy than existing methods when tested on standardized sets of human and vertebrate genes, with 75 to 80% of exons identified exactly. The program is also capable of indicating fairly accurately the reliability of each predicted exon. Consistently high levels of accuracy are observed for sequences of differing C + G content and for distinct groups of vertebrates.

Citing Articles

A near-complete genome assembly of Fragaria iinumae.

Du H, He Y, Chen M, Zheng X, Gui D, Tang J BMC Genomics. 2025; 26(1):253.

PMID: 40087556 DOI: 10.1186/s12864-025-11440-0.


Revealing Genomic Traits and Evolutionary Insights of Oryza officinalis from Southern China Through Genome Assembly and Transcriptome Analysis.

Chen C, Hu H, Guo H, Xia X, Zhang Z, Nong B Rice (N Y). 2025; 18(1):15.

PMID: 40082317 PMC: 11906960. DOI: 10.1186/s12284-025-00769-5.


Genomic insights into ecological adaptation of oaks revealed by phylogenomic analysis of multiple species.

Wang T, Ning X, Zheng S, Li Y, Lu Z, Meng H Plant Divers. 2025; 47(1):53-67.

PMID: 40041560 PMC: 11873581. DOI: 10.1016/j.pld.2024.07.008.


An Extensive Survey of Vertebrate-specific, Nonvisual Opsins Identifies a Novel Subfamily, Q113-Bistable Opsin.

Gyoja F, Sato K, Yamashita T, Kusakabe T Genome Biol Evol. 2025; 17(3).

PMID: 40036976 PMC: 11893379. DOI: 10.1093/gbe/evaf032.


Multiomics analysis provides insights into musk secretion in muskrat and musk deer.

Wang T, Yang M, Shi X, Tian S, Li Y, Xie W Gigascience. 2025; 14.

PMID: 40036429 PMC: 11878540. DOI: 10.1093/gigascience/giaf006.