» Articles » PMID: 26878723

An Expanded Sequence Context Model Broadly Explains Variability in Polymorphism Levels Across the Human Genome

Overview
Journal Nat Genet
Specialty Genetics
Date 2016 Feb 16
PMID 26878723
Citations 105
Authors
Affiliations
Soon will be listed here.
Abstract

The rate of single-nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediately flanking nucleotides around a polymorphic site--the site's trinucleotide sequence context--to study polymorphism levels across the genome. Moreover, the impact of larger sequence contexts has not been fully clarified, even though context substantially influences rates of polymorphism. Using a new statistical framework and data from the 1000 Genomes Project, we demonstrate that a heptanucleotide context explains >81% of variability in substitution probabilities, highlighting new mutation-promoting motifs at ApT dinucleotide, CAAT and TACG sequences. Our approach also identifies previously undocumented variability in C-to-T substitutions at CpG sites, which is not immediately explained by differential methylation intensity. Using our model, we present informative substitution intolerance scores for genes and a new intolerance score for amino acids, and we demonstrate clinical use of the model in neuropsychiatric diseases.

Citing Articles

Landscape of human protein-coding somatic mutations across tissues and individuals.

Xu H, Bierman R, Akey D, Koers C, Comi T, McWhite C bioRxiv. 2025; .

PMID: 39829890 PMC: 11741334. DOI: 10.1101/2025.01.07.631808.


A modeling of complex trait phenotypic variance determinants.

Hussain S PNAS Nexus. 2024; 3(11):pgae472.

PMID: 39529912 PMC: 11552524. DOI: 10.1093/pnasnexus/pgae472.


Towards the genomic sequence code of DNA fragility for machine learning.

Pflughaupt P, Abdullah A, Masuda K, Sahakyan A Nucleic Acids Res. 2024; 52(21):12798-12816.

PMID: 39441076 PMC: 11602142. DOI: 10.1093/nar/gkae914.


Machine Learning Reveals the Diversity of Human 3D Chromatin Contact Patterns.

Gilbertson E, Brand C, McArthur E, Rinker D, Kuang S, Pollard K Mol Biol Evol. 2024; 41(10).

PMID: 39404010 PMC: 11523124. DOI: 10.1093/molbev/msae209.


Genetic constraint at single amino acid resolution in protein domains improves missense variant prioritisation and gene discovery.

Zhang X, Theotokis P, Li N, Wright C, Samocha K, Whiffin N Genome Med. 2024; 16(1):88.

PMID: 38992748 PMC: 11238507. DOI: 10.1186/s13073-024-01358-9.


References
1.
Michaelson J, Shi Y, Gujral M, Zheng H, Malhotra D, Jin X . Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell. 2012; 151(7):1431-42. PMC: 3712641. DOI: 10.1016/j.cell.2012.11.019. View

2.
Fu W, OConnor T, Jun G, Kang H, Abecasis G, Leal S . Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2012; 493(7431):216-20. PMC: 3676746. DOI: 10.1038/nature11690. View

3.
Georgi B, Voight B, Bucan M . From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 2013; 9(5):e1003484. PMC: 3649967. DOI: 10.1371/journal.pgen.1003484. View

4.
Arbiza L, Gronau I, Aksoy B, Hubisz M, Gulko B, Keinan A . Genome-wide inference of natural selection on human transcription factor binding sites. Nat Genet. 2013; 45(7):723-9. PMC: 3932982. DOI: 10.1038/ng.2658. View

5.
Lawrence M, Stojanov P, Polak P, Kryukov G, Cibulskis K, Sivachenko A . Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013; 499(7457):214-218. PMC: 3919509. DOI: 10.1038/nature12213. View