» Articles » PMID: 39158621

A Review on Advancements in Feature Selection and Feature Extraction for High-dimensional NGS Data Analysis

Overview
Publisher Springer
Date 2024 Aug 19
PMID 39158621
Authors
Affiliations
Soon will be listed here.
Abstract

Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.

Citing Articles

Comparative analysis of dimensionality reduction techniques for EEG-based emotional state classification.

Sadegh-Zadeh S, Sadeghzadeh N, Soleimani O, Shiry Ghidary S, Movahedi S, Mousavi S Am J Neurodegener Dis. 2024; 13(4):23-33.

PMID: 39584052 PMC: 11578865. DOI: 10.62347/ZWRY8401.


An overview on olfaction in the biological, analytical, computational, and machine learning fields.

Chiera F, Costa G, Alcaro S, Artese A Arch Pharm (Weinheim). 2024; 358(1):e2400414.

PMID: 39439128 PMC: 11704061. DOI: 10.1002/ardp.202400414.

References
1.
Aevermann B, Zhang Y, Novotny M, Keshk M, Bakken T, Miller J . A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing. Genome Res. 2021; 31(10):1767-1780. PMC: 8494219. DOI: 10.1101/gr.275569.121. View

2.
Afrash M, Mirbagheri E, Mashoufi M, Kazemi-Arpanahi H . Optimizing prognostic factors of five-year survival in gastric cancer patients using feature selection techniques with machine learning algorithms: a comparative study. BMC Med Inform Decis Mak. 2023; 23(1):54. PMC: 10080884. DOI: 10.1186/s12911-023-02154-y. View

3.
Alshamlan H, Badr G, Alohali Y . Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification. Comput Biol Chem. 2015; 56:49-60. DOI: 10.1016/j.compbiolchem.2015.03.001. View

4.
Anders S, Huber W . Differential expression analysis for sequence count data. Genome Biol. 2010; 11(10):R106. PMC: 3218662. DOI: 10.1186/gb-2010-11-10-r106. View

5.
Andrews T, Hemberg M . M3Drop: dropout-based feature selection for scRNASeq. Bioinformatics. 2018; 35(16):2865-2867. PMC: 6691329. DOI: 10.1093/bioinformatics/bty1044. View