CAMUR: Knowledge Extraction from RNA-seq Cancer Data Through Equivalent Classification Rules
Overview
Affiliations
Motivation: Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class.
Results: We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced.
Availability And Implementation: dmb.iasi.cnr.it/camur.php
Contact: emanuel@iasi.cnr.it
Supplementary Information: Supplementary data are available at Bioinformatics online.
Unveiling epigenetic regulatory elements associated with breast cancer development.
Jardanowska-Kotuniak M, Draminski M, Wlasnowolski M, Lapinski M, Sengupta K, Agarwal A bioRxiv. 2024; .
PMID: 39605637 PMC: 11601335. DOI: 10.1101/2024.11.12.623187.
Ten quick tips for avoiding pitfalls in multi-omics data integration analyses.
Chicco D, Cumbo F, Angione C PLoS Comput Biol. 2023; 19(7):e1011224.
PMID: 37410704 PMC: 10325053. DOI: 10.1371/journal.pcbi.1011224.
Characterizing the extracellular matrix transcriptome of cervical, endometrial, and uterine cancers.
Cook C, Miller A, Barker T, Di Y, Fogg K Matrix Biol Plus. 2022; 15:100117.
PMID: 35898192 PMC: 9309672. DOI: 10.1016/j.mbplus.2022.100117.
Yuan L, Sun Y, Huang G Sensors (Basel). 2020; 20(5).
PMID: 32164283 PMC: 7085688. DOI: 10.3390/s20051528.
Knowledge Generation with Rule Induction in Cancer Omics.
Scala G, Federico A, Fortino V, Greco D, Majello B Int J Mol Sci. 2019; 21(1).
PMID: 31861438 PMC: 6981587. DOI: 10.3390/ijms21010018.