» Articles » PMID: 30835723

New Functionalities in the TCGAbiolinks Package for the Study and Integration of Cancer Data from GDC and GTEx

Overview
Specialty Biology
Date 2019 Mar 6
PMID 30835723
Citations 259
Authors
Affiliations
Soon will be listed here.
Abstract

The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7.

Citing Articles

QSAR-Based Drug Repurposing and RNA-Seq Metabolic Networks Highlight Treatment Opportunities for Hepatocellular Carcinoma Through Pyrimidine Starvation.

Talubo N, Dela Cruz E, Fowler P, Tsai P, Tayo L Cancers (Basel). 2025; 17(5).

PMID: 40075750 PMC: 11898721. DOI: 10.3390/cancers17050903.


Sex Differences in Cancer Functional Genomics: Gene Dependency and Drug Sensitivity.

Zeltser N, Zhu C, Oh J, Li C, Boutros P bioRxiv. 2025; .

PMID: 39975298 PMC: 11838570. DOI: 10.1101/2025.02.05.636540.


Modulation of tumor inflammatory signaling and drug sensitivity by CMTM4.

Xu Y, Kang K, Coakley B, Eisenstein S, Parveen A, Mai S EMBO J. 2025; .

PMID: 39948411 DOI: 10.1038/s44318-024-00330-y.


VIBE: an R-package for VIsualization of Bulk RNA Expression data for therapeutic targeting and disease stratification.

Khatri I, van Asten S, Moreno L, Higgs B, Klijn C, Blokzijl F Front Oncol. 2025; 14:1441133.

PMID: 39943991 PMC: 11815282. DOI: 10.3389/fonc.2024.1441133.


Multi-Omics Analysis Reveals Immune Infiltration and Clinical Significance of Phosphorylation Modification Enzymes in Lung Adenocarcinoma.

Long D, Ding Y, Wang P, Wei L, Ma K Int J Mol Sci. 2025; 26(3).

PMID: 39940833 PMC: 11817228. DOI: 10.3390/ijms26031066.


References
1.
Risso D, Schwartz K, Sherlock G, Dudoit S . GC-content normalization for RNA-Seq data. BMC Bioinformatics. 2011; 12:480. PMC: 3315510. DOI: 10.1186/1471-2105-12-480. View

2.
McCarthy D, Chen Y, Smyth G . Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012; 40(10):4288-97. PMC: 3378882. DOI: 10.1093/nar/gks042. View

3.
Aran D, Sirota M, Butte A . Systematic pan-cancer analysis of tumour purity. Nat Commun. 2015; 6:8971. PMC: 4671203. DOI: 10.1038/ncomms9971. View

4.
C Silva T, Colaprico A, Olsen C, DAngelo F, Bontempi G, Ceccarelli M . : Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Res. 2017; 5:1542. PMC: 5302158. DOI: 10.12688/f1000research.8923.2. View

5.
Collado-Torres L, Nellore A, Jaffe A . recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor. F1000Res. 2017; 6:1558. PMC: 5621122. DOI: 10.12688/f1000research.12223.1. View