» Articles » PMID: 36335397

Deciphering the Impact of Genetic Variation on Human Polyadenylation Using APARENT2

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2022 Nov 6
PMID 36335397
Authors
Affiliations
Soon will be listed here.
Abstract

Background: 3'-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to human disorders by disrupting the cis-regulatory code of polyadenylation signals. Yet, due to the complexity of this code, variant interpretation remains challenging.

Results: We introduce a residual neural network model, APARENT2, that can infer 3'-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2's performance on several variant datasets, including functional reporter data and human 3' aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. By combining APARENT2 with models of mRNA stability, we extend aQTL effect size predictions to the entire 3' untranslated region. Finally, we perform in silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of [Formula: see text] million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, we detect an association between gain-of-function mutations in the 3'-end and autism spectrum disorder. To experimentally validate APARENT2's predictions, we assayed clinically relevant variants in multiple cell lines, including microglia-derived cells.

Conclusions: A sequence-to-function model based on deep residual learning enables accurate functional interpretation of genetic variants in polyadenylation signals and, when coupled with large human variation databases, elucidates the link between functional 3'-end mutations and human health.

Citing Articles

Impact of rare non-coding variants on human diseases through alternative polyadenylation outliers.

Zou X, Zhao Z, Chen Y, Xiong K, Wang Z, Chen S Nat Commun. 2025; 16(1):682.

PMID: 39819850 PMC: 11739498. DOI: 10.1038/s41467-024-55407-3.


RBBP6 anchors pre-mRNA 3' end processing to nuclear speckles for efficient gene expression.

Yoon Y, Bournique E, Soles L, Yin H, Chu H, Yin C Mol Cell. 2025; 85(3):555-570.e8.

PMID: 39798570 PMC: 11805622. DOI: 10.1016/j.molcel.2024.12.016.


Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation.

Linder J, Srivastava D, Yuan H, Agarwal V, Kelley D Nat Genet. 2025; .

PMID: 39779956 DOI: 10.1038/s41588-024-02053-6.


Active learning of enhancers and silencers in the developing neural retina.

Friedman R, Ramu A, Lichtarge S, Wu Y, Tripp L, Lyon D Cell Syst. 2025; 16(1):101163.

PMID: 39778579 PMC: 11827711. DOI: 10.1016/j.cels.2024.12.004.


Recurrent disruption of tumour suppressor genes in cancer by somatic mutations in cleavage and polyadenylation signals.

Kainov Y, Hamid F, Makeyev E Elife. 2024; 13.

PMID: 39660592 PMC: 11634062. DOI: 10.7554/eLife.99040.


References
1.
Fischbach G, Lord C . The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron. 2010; 68(2):192-5. DOI: 10.1016/j.neuron.2010.10.006. View

2.
Landrum M, Lee J, Benson M, Brown G, Chao C, Chitipiralla S . ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2015; 44(D1):D862-8. PMC: 4702865. DOI: 10.1093/nar/gkv1222. View

3.
Avsec Z, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K . Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet. 2021; 53(3):354-366. PMC: 8812996. DOI: 10.1038/s41588-021-00782-6. View

4.
Hochreiter S, Schmidhuber J . Long short-term memory. Neural Comput. 1997; 9(8):1735-80. DOI: 10.1162/neco.1997.9.8.1735. View

5.
Wang J, Huang D, Zhou Y, Yao H, Liu H, Zhai S . CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies. Nucleic Acids Res. 2019; 48(D1):D807-D816. PMC: 7145620. DOI: 10.1093/nar/gkz1026. View