» Articles » PMID: 26519501

CAMUR: Knowledge Extraction from RNA-seq Cancer Data Through Equivalent Classification Rules

Overview
Journal Bioinformatics
Specialty Biology
Date 2015 Nov 1
PMID 26519501
Citations 13
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class.

Results: We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced.

Availability And Implementation: dmb.iasi.cnr.it/camur.php

Contact: emanuel@iasi.cnr.it

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Unveiling epigenetic regulatory elements associated with breast cancer development.

Jardanowska-Kotuniak M, Draminski M, Wlasnowolski M, Lapinski M, Sengupta K, Agarwal A bioRxiv. 2024; .

PMID: 39605637 PMC: 11601335. DOI: 10.1101/2024.11.12.623187.


Ten quick tips for avoiding pitfalls in multi-omics data integration analyses.

Chicco D, Cumbo F, Angione C PLoS Comput Biol. 2023; 19(7):e1011224.

PMID: 37410704 PMC: 10325053. DOI: 10.1371/journal.pcbi.1011224.


Characterizing the extracellular matrix transcriptome of cervical, endometrial, and uterine cancers.

Cook C, Miller A, Barker T, Di Y, Fogg K Matrix Biol Plus. 2022; 15:100117.

PMID: 35898192 PMC: 9309672. DOI: 10.1016/j.mbplus.2022.100117.


Using Class-Specific Feature Selection for Cancer Detection with Gene Expression Profile Data of Platelets.

Yuan L, Sun Y, Huang G Sensors (Basel). 2020; 20(5).

PMID: 32164283 PMC: 7085688. DOI: 10.3390/s20051528.


Knowledge Generation with Rule Induction in Cancer Omics.

Scala G, Federico A, Fortino V, Greco D, Majello B Int J Mol Sci. 2019; 21(1).

PMID: 31861438 PMC: 6981587. DOI: 10.3390/ijms21010018.


References
1.
Howe E, Sinha R, Schlauch D, Quackenbush J . RNA-Seq analysis in MeV. Bioinformatics. 2011; 27(22):3209-10. PMC: 3208390. DOI: 10.1093/bioinformatics/btr490. View

2.
Lehr T, Yuan J, Zeumer D, Jayadev S, Ritchie M . Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies. BioData Min. 2011; 4:4. PMC: 3060133. DOI: 10.1186/1756-0381-4-4. View

3.
Tothill R, Shi F, Paiman L, Bedo J, Kowalczyk A, Mileshkin L . Development and validation of a gene expression tumour classifier for cancer of unknown primary. Pathology. 2014; 47(1):7-12. DOI: 10.1097/PAT.0000000000000194. View

4.
Pirooznia M, Yang J, Yang M, Deng Y . A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics. 2008; 9 Suppl 1:S13. PMC: 2386055. DOI: 10.1186/1471-2164-9-S1-S13. View

5.
Li B, Dewey C . RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011; 12:323. PMC: 3163565. DOI: 10.1186/1471-2105-12-323. View