Meta-imputation of Transcriptome from Genotypes Across Multiple Datasets by Leveraging Publicly Available Summary-level Data

Overview

Journal PLoS Genet

Specialty Genetics

Date 2022 Jan 31

PMID 35100255

Authors

Andrew E Liu

Hyun Min Kang

Affiliations

Soon will be listed here.

Abstract

Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits.

Citing Articles

SR-TWAS: leveraging multiple reference panels to improve transcriptome-wide association study power by ensemble machine learning.

Parrish R, Buchman A, Tasaki S, Wang Y, Avey D, Xu J Nat Commun. 2024; 15(1):6646.

PMID: 39103319 PMC: 11300466. DOI: 10.1038/s41467-024-50983-w.

OTTERS: a powerful TWAS framework leveraging summary-level reference data.

Dai Q, Zhou G, Zhao H, Vosa U, Franke L, Battle A Nat Commun. 2023; 14(1):1271.

PMID: 36882394 PMC: 9992663. DOI: 10.1038/s41467-023-36862-w.

References

Johnson W, Li C, Rabinovic A . Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2006; 8(1):118-27. DOI: 10.1093/biostatistics/kxj037. View

Pingault J, OReilly P, Schoeler T, Ploubidis G, Rijsdijk F, Dudbridge F . Using genetic data to strengthen causal inference in observational research. Nat Rev Genet. 2018; 19(9):566-580. DOI: 10.1038/s41576-018-0020-3. View

Okoro P, Schubert R, Guo X, Johnson W, Rotter J, Hoeschele I . Transcriptome prediction performance across machine learning models and diverse ancestries. HGG Adv. 2021; 2(2). PMC: 8087249. DOI: 10.1016/j.xhgg.2020.100019. View

Barfield R, Feng H, Gusev A, Wu L, Zheng W, Pasaniuc B . Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol. 2018; 42(5):418-433. PMC: 6342197. DOI: 10.1002/gepi.22131. View

Lappalainen T, Sammeth M, Friedlander M, t Hoen P, Monlong J, Rivas M . Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013; 501(7468):506-11. PMC: 3918453. DOI: 10.1038/nature12531. View

Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat S . A statistical framework for cross-tissue transcriptome-wide association analysis. Nat Genet. 2019; 51(3):568-576. PMC: 6788740. DOI: 10.1038/s41588-019-0345-7. View

Gamazon E, Wheeler H, Shah K, Mozaffari S, Aquino-Michaels K, Carroll R . A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015; 47(9):1091-8. PMC: 4552594. DOI: 10.1038/ng.3367. View

Tam V, Patel N, Turcotte M, Bosse Y, Pare G, Meyre D . Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019; 20(8):467-484. DOI: 10.1038/s41576-019-0127-1. View

Zhang Y, Parmigiani G, Johnson W . : batch effect adjustment for RNA-seq count data. NAR Genom Bioinform. 2020; 2(3):lqaa078. PMC: 7518324. DOI: 10.1093/nargab/lqaa078. View

10.

Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx B . Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016; 48(3):245-52. PMC: 4767558. DOI: 10.1038/ng.3506. View

11.

Bhattacharya A, Garcia-Closas M, Olshan A, Perou C, Troester M, Love M . A framework for transcriptome-wide association studies in breast cancer in diverse study populations. Genome Biol. 2020; 21(1):42. PMC: 7033948. DOI: 10.1186/s13059-020-1942-6. View

12.

Friedman J, Hastie T, Tibshirani R . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33(1):1-22. PMC: 2929880. View

13.

Collins F, Varmus H . A new initiative on precision medicine. N Engl J Med. 2015; 372(9):793-5. PMC: 5101938. DOI: 10.1056/NEJMp1500523. View

14.

Barbeira A, Dickinson S, Bonazzola R, Zheng J, Wheeler H, Torres J . Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun. 2018; 9(1):1825. PMC: 5940825. DOI: 10.1038/s41467-018-03621-1. View

15.

Morris A, Voight B, Teslovich T, Ferreira T, V Segre A, Steinthorsdottir V . Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet. 2012; 44(9):981-90. PMC: 3442244. DOI: 10.1038/ng.2383. View

16.

Bulik-Sullivan B, Loh P, Finucane H, Ripke S, Yang J, Patterson N . LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015; 47(3):291-5. PMC: 4495769. DOI: 10.1038/ng.3211. View

17.

Pruitt K, Harrow J, Harte R, Wallin C, Diekhans M, Maglott D . The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009; 19(7):1316-23. PMC: 2704439. DOI: 10.1101/gr.080531.108. View

18.

Willer C, Schmidt E, Sengupta S, Peloso G, Gustafsson S, Kanoni S . Discovery and refinement of loci associated with lipid levels. Nat Genet. 2013; 45(11):1274-1283. PMC: 3838666. DOI: 10.1038/ng.2797. View

19.

Battle A, Mostafavi S, Zhu X, Potash J, Weissman M, McCormick C . Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 2013; 24(1):14-24. PMC: 3875855. DOI: 10.1101/gr.155192.113. View

20.

. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020; 369(6509):1318-1330. PMC: 7737656. DOI: 10.1126/science.aaz1776. View