» Articles » PMID: 35100255

Meta-imputation of Transcriptome from Genotypes Across Multiple Datasets by Leveraging Publicly Available Summary-level Data

Overview
Journal PLoS Genet
Specialty Genetics
Date 2022 Jan 31
PMID 35100255
Authors
Affiliations
Soon will be listed here.
Abstract

Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits.

Citing Articles

SR-TWAS: leveraging multiple reference panels to improve transcriptome-wide association study power by ensemble machine learning.

Parrish R, Buchman A, Tasaki S, Wang Y, Avey D, Xu J Nat Commun. 2024; 15(1):6646.

PMID: 39103319 PMC: 11300466. DOI: 10.1038/s41467-024-50983-w.


OTTERS: a powerful TWAS framework leveraging summary-level reference data.

Dai Q, Zhou G, Zhao H, Vosa U, Franke L, Battle A Nat Commun. 2023; 14(1):1271.

PMID: 36882394 PMC: 9992663. DOI: 10.1038/s41467-023-36862-w.

References
1.
Johnson W, Li C, Rabinovic A . Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2006; 8(1):118-27. DOI: 10.1093/biostatistics/kxj037. View

2.
Pingault J, OReilly P, Schoeler T, Ploubidis G, Rijsdijk F, Dudbridge F . Using genetic data to strengthen causal inference in observational research. Nat Rev Genet. 2018; 19(9):566-580. DOI: 10.1038/s41576-018-0020-3. View

3.
Okoro P, Schubert R, Guo X, Johnson W, Rotter J, Hoeschele I . Transcriptome prediction performance across machine learning models and diverse ancestries. HGG Adv. 2021; 2(2). PMC: 8087249. DOI: 10.1016/j.xhgg.2020.100019. View

4.
Barfield R, Feng H, Gusev A, Wu L, Zheng W, Pasaniuc B . Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol. 2018; 42(5):418-433. PMC: 6342197. DOI: 10.1002/gepi.22131. View

5.
Lappalainen T, Sammeth M, Friedlander M, t Hoen P, Monlong J, Rivas M . Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013; 501(7468):506-11. PMC: 3918453. DOI: 10.1038/nature12531. View