» Articles » PMID: 27853602

Size Distribution of Function-based Human Gene Sets and the Split-merge Model

Overview
Journal R Soc Open Sci
Specialty Science
Date 2016 Nov 18
PMID 27853602
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

The sizes of paralogues-gene families produced by ancestral duplication-are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.

Citing Articles

The hitchhikers' guide to RNA sequencing and functional analysis.

Chen J, Shrestha L, Green G, Leier A, Marquez-Lago T Brief Bioinform. 2023; 24(1).

PMID: 36617463 PMC: 9851315. DOI: 10.1093/bib/bbac529.


Reusable building blocks in biological systems.

Mireles V, Conrad T J R Soc Interface. 2019; 15(149):20180595.

PMID: 30958230 PMC: 6303794. DOI: 10.1098/rsif.2018.0595.


Population patterns in World's administrative units.

Fontanelli O, Miramontes P, Cocho G, Li W R Soc Open Sci. 2017; 4(7):170281.

PMID: 28791153 PMC: 5541548. DOI: 10.1098/rsos.170281.

References
1.
Nei M, Gu X, Sitnikova T . Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc Natl Acad Sci U S A. 1997; 94(15):7799-806. PMC: 33709. DOI: 10.1073/pnas.94.15.7799. View

2.
Hahn M, Han M, Han S . Gene family evolution across 12 Drosophila genomes. PLoS Genet. 2007; 3(11):e197. PMC: 2065885. DOI: 10.1371/journal.pgen.0030197. View

3.
Harrow J, Frankish A, Gonzalez J, Tapanari E, Diekhans M, Kokocinski F . GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012; 22(9):1760-74. PMC: 3431492. DOI: 10.1101/gr.135350.111. View

4.
Gilad Y, Man O, Paabo S, Lancet D . Human specific loss of olfactory receptor genes. Proc Natl Acad Sci U S A. 2003; 100(6):3324-7. PMC: 152291. DOI: 10.1073/pnas.0535697100. View

5.
Daugherty L, Seal R, Wright M, Bruford E . Gene family matters: expanding the HGNC resource. Hum Genomics. 2012; 6:4. PMC: 3437568. DOI: 10.1186/1479-7364-6-4. View