» Articles » PMID: 39120880

TransTEx: Novel Tissue-specificity Scoring Method for Grouping Human Transcriptome into Different Expression Groups

Overview
Journal Bioinformatics
Specialty Biology
Date 2024 Aug 9
PMID 39120880
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Although human tissues carry out common molecular processes, gene expression patterns can distinguish different tissues. Traditional informatics methods, primarily at the gene level, overlook the complexity of alternative transcript variants and protein isoforms produced by most genes, changes in which are linked to disease prognosis and drug resistance.

Results: We developed TransTEx (Transcript-level Tissue Expression), a novel tissue-specificity scoring method, for grouping transcripts into four expression groups. TransTEx applies sequential cut-offs to tissue-wise transcript probability estimates, subsampling-based P-values and fold-change estimates. Application of TransTEx on GTEx mRNA-seq data divided 199 166 human transcripts into different groups as 17 999 tissue-specific (TSp), 7436 tissue-enhanced, 36 783 widely expressed (Wide), 79 191 lowly expressed (Low), and 57 757 no expression (Null) transcripts. Testis has the most (13 466) TSp isoforms followed by liver (890), brain (701), pituitary (435), and muscle (420). We found that the tissue specificity of alternative transcripts of a gene is predominantly influenced by alternate promoter usage. By overlapping brain-specific transcripts with the cell-type gene-markers in scBrainMap database, we found that 63% of the brain-specific transcripts were enriched in nonneuronal cell types, predominantly astrocytes followed by endothelial cells and oligodendrocytes. In addition, we found 61 brain cell-type marker genes encoding a total of 176 alternative transcripts as brain-specific and 22 alternative transcripts as testis-specific, highlighting the complex TSp and cell-type specific gene regulation and expression at isoform-level. TransTEx can be adopted to the analysis of bulk RNA-seq or scRNA-seq datasets to find tissue- and/or cell-type specific isoform-level gene markers.

Availability And Implementation: TransTEx database: https://bmi.cewit.stonybrook.edu/transtexdb/ and the R package is available via GitHub: https://github.com/pallavisurana1/TransTEx.

References
1.
Barshir R, Fishilevich S, Iny-Stein T, Zelig O, Mazor Y, Guan-Golan Y . GeneCaRNA: A Comprehensive Gene-centric Database of Human Non-coding RNAs in the GeneCards Suite. J Mol Biol. 2021; 433(11):166913. DOI: 10.1016/j.jmb.2021.166913. View

2.
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z . clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb). 2021; 2(3):100141. PMC: 8454663. DOI: 10.1016/j.xinn.2021.100141. View

3.
Upadhya S, Ryan C . Experimental reproducibility limits the correlation between mRNA and protein abundances in tumor proteomic profiles. Cell Rep Methods. 2022; 2(9):100288. PMC: 9499981. DOI: 10.1016/j.crmeth.2022.100288. View

4.
Djureinovic D, Fagerberg L, Hallstrom B, Danielsson A, Lindskog C, Uhlen M . The human testis-specific proteome defined by transcriptomics and antibody-based profiling. Mol Hum Reprod. 2014; 20(6):476-88. DOI: 10.1093/molehr/gau018. View

5.
Duffy A, Verbanck M, Dobbyn A, Won H, Rein J, Forrest I . Tissue-specific genetic features inform prediction of drug side effects in clinical trials. Sci Adv. 2020; 6(37). PMC: 11206454. DOI: 10.1126/sciadv.abb6242. View