» Articles » PMID: 33449949

Enhancing Web Search Result Clustering Model Based on Multiview Multirepresentation Consensus Cluster Ensemble (mmcc) Approach

Overview
Journal PLoS One
Date 2021 Jan 15
PMID 33449949
Citations 17
Authors
Affiliations
Soon will be listed here.
Abstract

Existing text clustering methods utilize only one representation at a time (single view), whereas multiple views can represent documents. The multiview multirepresentation method enhances clustering quality. Moreover, existing clustering methods that utilize more than one representation at a time (multiview) use representation with the same nature. Hence, using multiple views that represent data in a different representation with clustering methods is reasonable to create a diverse set of candidate clustering solutions. On this basis, an effective dynamic clustering method must consider combining multiple views of data including semantic view, lexical view (word weighting), and topic view as well as the number of clusters. The main goal of this study is to develop a new method that can improve the performance of web search result clustering (WSRC). An enhanced multiview multirepresentation consensus clustering ensemble (MMCC) method is proposed to create a set of diverse candidate solutions and select a high-quality overlapping cluster. The overlapping clusters are obtained from the candidate solutions created by different clustering methods. The framework to develop the proposed MMCC includes numerous stages: (1) acquiring the standard datasets (MORESQUE and Open Directory Project-239), which are used to validate search result clustering algorithms, (2) preprocessing the dataset, (3) applying multiview multirepresentation clustering models, (4) using the radius-based cluster number estimation algorithm, and (5) employing the consensus clustering ensemble method. Results show an improvement in clustering methods when multiview multirepresentation is used. More importantly, the proposed MMCC model improves the overall performance of WSRC compared with all single-view clustering models.

Citing Articles

The interplay between angiogenesis-associated genes and molecular, clinical, and immune features in bladder cancer.

Guo X, Yang J, Cao R, Hao G Discov Oncol. 2025; 16(1):265.

PMID: 40042726 PMC: 11883062. DOI: 10.1007/s12672-025-01966-w.


Mechanistic insights into PROS1 inhibition of bladder cancer progression and angiogenesis via the AKT/GSK3β/β-catenin pathway.

Fan X, Wang J, Chen S, Li X, Cao J, Wang H Sci Rep. 2025; 15(1):4748.

PMID: 39922934 PMC: 11807197. DOI: 10.1038/s41598-025-89217-4.


The complement C3a/C3aR pathway is associated with treatment resistance to gemcitabine-based neoadjuvant therapy in pancreatic cancer.

Shi S, Ye L, Jin K, Yu X, Guo D, Wu W Comput Struct Biotechnol J. 2024; 23:3634-3650.

PMID: 39469671 PMC: 11513484. DOI: 10.1016/j.csbj.2024.09.032.


Vasculogenic mimicry-related gene prognostic index for predicting prognosis, immune microenvironment in clear cell renal cell carcinoma.

Ou J, Yin H, Shu F, Wu Z, Liu S, Ye J Heliyon. 2024; 10(16):e36235.

PMID: 39247316 PMC: 11380016. DOI: 10.1016/j.heliyon.2024.e36235.


A multi-view representation technique based on principal component analysis for enhanced short text clustering.

Ahmed M, Tiun S, Omar N, Sani N PLoS One. 2024; 19(8):e0309206.

PMID: 39178180 PMC: 11343383. DOI: 10.1371/journal.pone.0309206.


References
1.
Marx A, Backes C, Meese E, Lenhof H, Keller A . EDISON-WMW: Exact Dynamic Programing Solution of the Wilcoxon-Mann-Whitney Test. Genomics Proteomics Bioinformatics. 2016; 14(1):55-61. PMC: 4792850. DOI: 10.1016/j.gpb.2015.11.004. View

2.
Chao G, Sun J, Lu J, Wang A, Langleben D, Li C . Multi-View Cluster Analysis with Incomplete Data to Understand Treatment Effects. Inf Sci (N Y). 2020; 494:278-293. PMC: 7455020. DOI: 10.1016/j.ins.2019.04.039. View

3.
Zhang Y, Chen Q, Yang Z, Lin H, Lu Z . BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci Data. 2019; 6(1):52. PMC: 6510737. DOI: 10.1038/s41597-019-0055-0. View

4.
Abu-Jamous B, Fa R, Roberts D, Nandi A . Paradigm of tunable clustering using Binarization of Consensus Partition Matrices (Bi-CoPaM) for gene discovery. PLoS One. 2013; 8(2):e56432. PMC: 3569426. DOI: 10.1371/journal.pone.0056432. View

5.
Wei C, Luo S, Ma X, Ren H, Zhang J, Pan L . Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation. PLoS One. 2016; 11(1):e0146672. PMC: 4718658. DOI: 10.1371/journal.pone.0146672. View