» Articles » PMID: 11779845

The K(A)/K(S) Ratio Test for Assessing the Protein-coding Potential of Genomic Regions: an Empirical and Simulation Study

Overview
Journal Genome Res
Specialty Genetics
Date 2002 Jan 10
PMID 11779845
Citations 156
Authors
Affiliations
Soon will be listed here.
Abstract

Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (K(S)) occur much more frequently than nonsynonymous ones (K(A)) and uses the K(A)/K(S) ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods.

Citing Articles

Genome-wide identification, characterization, and functional analysis of the CHX, SOS, and RLK genes in Solanum lycopersicum under salt stress.

Maghraby A, Alzalaty M Sci Rep. 2025; 15(1):1142.

PMID: 39774029 PMC: 11707246. DOI: 10.1038/s41598-024-83221-w.


Genome-wide identification, evolution and expression analysis unveil the role of genes in nitrogen utilization and nitrogen allocation.

Wang B, Ren S, Chen S, Hao S, Xu G, Hu S Physiol Mol Biol Plants. 2025; 30(12):1983-1999.

PMID: 39744329 PMC: 11685372. DOI: 10.1007/s12298-024-01541-7.


Genome-wide identification, characterization and expression profiles of FORMIN gene family in cotton (Gossypium Raimondii L.).

Shing P, Islam M, Khatun M, Zohra F, Hasan N, Rahman S BMC Genom Data. 2024; 25(1):105.

PMID: 39695391 PMC: 11657977. DOI: 10.1186/s12863-024-01285-z.


Describing and characterizing the gene family across plant species: a systematic review.

Harvey A, van den Berg N, Swart V Front Plant Sci. 2024; 15:1467148.

PMID: 39600901 PMC: 11588464. DOI: 10.3389/fpls.2024.1467148.


Genome-wide identification of CAMTA genes and their expression dependence on light and calcium signaling during seedling growth and development in mung bean.

Wicaksono A, Buaboocha T BMC Genomics. 2024; 25(1):992.

PMID: 39443876 PMC: 11515718. DOI: 10.1186/s12864-024-10893-z.


References
1.
Batzoglou S, Pachter L, Mesirov J, Berger B, Lander E . Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 2000; 10(7):950-8. PMC: 310911. DOI: 10.1101/gr.10.7.950. View

2.
Goldman N, Yang Z . A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994; 11(5):725-36. DOI: 10.1093/oxfordjournals.molbev.a040153. View

3.
Yang , Bielawski . Statistical methods for detecting molecular adaptation. Trends Ecol Evol. 2000; 15(12):496-503. PMC: 7134603. DOI: 10.1016/s0169-5347(00)01994-7. View

4.
Rogic S, Mackworth A, Ouellette F . Evaluation of gene-finding programs on mammalian sequences. Genome Res. 2001; 11(5):817-32. PMC: 311133. DOI: 10.1101/gr.147901. View

5.
Makalowski W, Boguski M . Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc Natl Acad Sci U S A. 1998; 95(16):9407-12. PMC: 21351. DOI: 10.1073/pnas.95.16.9407. View