» Articles » PMID: 31010885

Ancestry Patterns Inferred from Massive RNA-seq Data

Overview
Journal RNA
Specialty Molecular Biology
Date 2019 Apr 24
PMID 31010885
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

There is a growing body of evidence suggesting that patterns of gene expression vary within and between human populations. However, the impact of this variation in human diseases has been poorly explored, in part owing to the lack of a standardized protocol to estimate biogeographical ancestry from gene expression studies. Here we examine several studies that provide new solid evidence indicating that the ancestral background of individuals impacts gene expression patterns. Next, we test a procedure to infer genetic ancestry from RNA-seq data in 25 data sets where information on ethnicity was reported. Genome data of reference continental populations retrieved from The 1000 Genomes Project were used for comparisons. Remarkably, only eight out of 25 data sets passed FastQC default filters. We demonstrate that, for these eight population sets, the ancestral background of donors could be inferred very efficiently, even in data sets including samples with complex patterns of admixture (e.g., American-admixed populations). For most of the gene expression data sets of suboptimal quality, ancestral inference yielded odd patterns. The present study thus brings a cautionary note for gene expression studies highlighting the importance to control for the potential confounding effect of ancestral genetic background.

Citing Articles

Equitable machine learning counteracts ancestral bias in precision medicine.

Smith L, Cahill J, Lee J, Graim K Nat Commun. 2025; 16(1):2144.

PMID: 40064867 PMC: 11894161. DOI: 10.1038/s41467-025-57216-8.


A diagnostic host-specific transcriptome response for Mycoplasma pneumoniae pneumonia to guide pediatric patient treatment.

Viz-Lasheras S, Gomez-Carballa A, Bello X, Rivero-Calle I, Dacosta A, Kaforou M Nat Commun. 2025; 16(1):673.

PMID: 39809748 PMC: 11733158. DOI: 10.1038/s41467-025-55932-9.


Cross-population enhancement of PrediXcan predictions with a gnomAD-based east Asian reference framework.

Chan H, Chattopadhyay A, Lu T Brief Bioinform. 2024; 25(6).

PMID: 39441246 PMC: 11497844. DOI: 10.1093/bib/bbae549.


Whole exome sequencing identifies new susceptibility candidates underlying community-acquired pneumonia.

Pardo-Seco J, Viz-Lasheras S, Bello X, Gomez-Carballa A, Camino-Mera A, Pischedda S Genes Dis. 2024; 11(6):101170.

PMID: 39157457 PMC: 11327392. DOI: 10.1016/j.gendis.2023.101170.


Genotype prediction of 336,463 samples from public expression data.

Razi A, Lo C, Wang S, Leek J, Hansen K bioRxiv. 2024; .

PMID: 38559266 PMC: 10979922. DOI: 10.1101/2023.10.21.562237.


References
1.
Shchetynsky K, Diaz-Gallo L, Folkersen L, Hensvold A, Catrina A, Berg L . Discovery of new candidate genes for rheumatoid arthritis through integration of genetic association data with expression pathway analysis. Arthritis Res Ther. 2017; 19(1):19. PMC: 5288892. DOI: 10.1186/s13075-017-1220-5. View

2.
Edgar R, Domrachev M, Lash A . Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2001; 30(1):207-10. PMC: 99122. DOI: 10.1093/nar/30.1.207. View

3.
Brown J, Pirrung M, McCue L . FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool. Bioinformatics. 2017; 33(19):3137-3139. PMC: 5870778. DOI: 10.1093/bioinformatics/btx373. View

4.
Oikkonen L, Lise S . Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection. Wellcome Open Res. 2017; 2:6. PMC: 5322827. DOI: 10.12688/wellcomeopenres.10501.2. View

5.
DeBerg H, Zaidi M, Altman M, Khaenam P, Gersuk V, Campos F . Shared and organism-specific host responses to childhood diarrheal diseases revealed by whole blood transcript profiling. PLoS One. 2018; 13(1):e0192082. PMC: 5788382. DOI: 10.1371/journal.pone.0192082. View