» Articles » PMID: 34998929

PanClassif: Improving Pan Cancer Classification of Single Cell RNA-seq Gene Expression Data Using Machine Learning

Overview
Journal Genomics
Specialty Genetics
Date 2022 Jan 9
PMID 34998929
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Cancer is one of the major causes of human death per year. In recent years, cancer identification and classification using machine learning have gained momentum due to the availability of high throughput sequencing data. Using RNA-seq, cancer research is blooming day by day and new insights of cancer and related treatments are coming into light. In this paper, we propose PanClassif, a method that requires a very few and effective genes to detect cancer from RNA-seq data and is able to provide performance gain in several wide range machine learning classifiers. We have taken 22 types of cancer samples from The Cancer Genome Atlas (TCGA) having 8287 cancer samples and 680 normal samples. Firstly, PanClassif uses k-Nearest Neighbour (k-NN) smoothing to smooth the samples to handle noise in the data. Then effective genes are selected by Anova based test. For balancing the train data, PanClassif applies an oversampling method, SMOTE. We have performed comprehensive experiments on the datasets using several classification algorithms. Experimental results shows that PanClassif outperform existing state-of-the-art methods available and shows consistent performance for two single cell RNA-seq datasets taken from Gene Expression Omnibus (GEO). PanClassif improves performances of a wide variety of classifiers for both binary cancer prediction and multi-class cancer classification. PanClassif is available as a python package (https://pypi.org/project/panclassif/). All the source code and materials of PanClassif are available at https://github.com/Zwei-inc/panclassif.

Citing Articles

Advancing precision medicine: the transformative role of artificial intelligence in immunogenomics, radiomics, and pathomics for biomarker discovery and immunotherapy optimization.

Chang L, Liu J, Zhu J, Guo S, Wang Y, Zhou Z Cancer Biol Med. 2025; 22(1).

PMID: 39749734 PMC: 11795263. DOI: 10.20892/j.issn.2095-3941.2024.0376.


Occlusion enhanced pan-cancer classification via deep learning.

Zhao X, Chen Z, Wang H, Sun H BMC Bioinformatics. 2024; 25(1):260.

PMID: 39118043 PMC: 11308240. DOI: 10.1186/s12859-024-05870-y.


A platform-independent AI tumor lineage and site (ATLAS) classifier.

Rydzewski N, Shi Y, Li C, Chrostek M, Bakhtiar H, Helzer K Commun Biol. 2024; 7(1):314.

PMID: 38480799 PMC: 10937974. DOI: 10.1038/s42003-024-05981-5.


Analysis of RNA-Seq data using self-supervised learning for vital status prediction of colorectal cancer patients.

Padegal G, Rao M, Boggaram Ravishankar O, Acharya S, Athri P, Srinivasa G BMC Bioinformatics. 2023; 24(1):241.

PMID: 37286944 PMC: 10249191. DOI: 10.1186/s12859-023-05347-4.


Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review.

Alharbi F, Vakanski A Bioengineering (Basel). 2023; 10(2).

PMID: 36829667 PMC: 9952758. DOI: 10.3390/bioengineering10020173.