» Articles » PMID: 40082746

Comparative Analysis of Genotype Imputation Strategies for SNPs Calling from RNA-seq

Overview
Journal BMC Genomics
Publisher Biomed Central
Specialty Genetics
Date 2025 Mar 14
PMID 40082746
Authors
Affiliations
Soon will be listed here.
Abstract

Background: RNA sequencing (RNA-seq) is a powerful tool for transcriptome profiling, enabling integrative studies of expression quantitative trait loci (eQTL). As it identifies fewer genetic variants than DNA sequencing (DNA-seq), reference panel-based genotype imputation is often required to enhance its utility.

Results: This study evaluated the accuracy of genotype imputation using SNPs called from RNA-seq data (RNA-SNPs). SNP features from 6,567 RNA-seq samples across 28 pig tissues were used to mask whole genome sequencing (WGS) data, with the Pig Genomic Reference Panel (PGRP) serving as the reference panel. Three imputation software tools (i.e., Beagle, Minimac4, and Impute5) were employed to perform the imputation. The result showed that RNA-SNPs achieved higher imputation accuracy (CR: 0.895 ~ 0.933; r²: 0.745 ~ 0.817) than SNPs from GeneSeek Genomic Profiler Porcine SNP50 BeadChip (Chip-SNPs) (CR: 0.873 ~ 0.909; r²: 0.629 ~ 0.698), and lower accuracy in "intergenic" regions. After imputation, quality control (QC) by minor allele frequency (MAF) and imputation quality (DR²) could improve r² but reduce SNP retention. Among software, Minimac4 takes the least runtime in single-thread setting, while Beagle performed best in multi-thread setting and phasing. Impute5 takes up minimal memory usage but requires the maximum runtime. All tools demonstrated comparable global accuracy (CR: 0.906 ~ 0.917; r²: 0.780 ~ 0.787).

Conclusions: This study offers practical guidance for conducting RNA-SNP imputation strategies in genome and transcriptome research.

References
1.
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M . The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008; 320(5881):1344-9. PMC: 2951732. DOI: 10.1126/science.1158441. View

2.
Lister R, OMalley R, Tonti-Filippini J, Gregory B, Berry C, Millar A . Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008; 133(3):523-36. PMC: 2723732. DOI: 10.1016/j.cell.2008.03.029. View

3.
Deelen P, Zhernakova D, de Haan M, van der Sijde M, Bonder M, Karjalainen J . Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels. Genome Med. 2015; 7(1):30. PMC: 4423486. DOI: 10.1186/s13073-015-0152-4. View

4.
Piskol R, Ramaswami G, Li J . Reliable identification of genomic variants from RNA-seq data. Am J Hum Genet. 2013; 93(4):641-51. PMC: 3791257. DOI: 10.1016/j.ajhg.2013.08.008. View

5.
Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B . Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008; 5(7):621-8. DOI: 10.1038/nmeth.1226. View