» Articles » PMID: 12603044

Evaluation of the Vector Space Representation in Text-based Gene Clustering

Overview
Publisher World Scientific
Specialty Biology
Date 2003 Feb 27
PMID 12603044
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

Thanks to its increasing availability, electronic literature can now be a major source of information when developing complex statistical models where data is scarce or contains much noise. This raises the question of how to deeply integrate information from domain literature with experimental data. Evaluating what kind of statistical text representations can integrate literature knowledge in clustering still remains an unsufficiently explored topic. In this work we discuss how the bag-of-words representation can be used successfully to represent genetic annotation and free-text information coming from different databases. We demonstrate the effect of various weighting schemes and information sources in a functional clustering setup. As a quantitative evaluation, we contrast for different parameter settings the functional groupings obtained from text with those obtained from expert assessments and link each of the results to a biological discussion.

Citing Articles

Classification of genomes with a bag-of-words approach and machine learning.

Podda M, Bonechi S, Palladino A, Scaramuzzino M, Brozzi A, Roma G iScience. 2024; 27(3):109257.

PMID: 38439962 PMC: 10910294. DOI: 10.1016/j.isci.2024.109257.


A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

Xiang Z, Qin T, Qin Z, He Y BMC Syst Biol. 2014; 7 Suppl 3:S9.

PMID: 24555475 PMC: 3852244. DOI: 10.1186/1752-0509-7-S3-S9.


Evaluation of semantic-based information retrieval methods in the autism phenotype domain.

Hassanpour S, OConnor M, Das A AMIA Annu Symp Proc. 2011; 2011:569-77.

PMID: 22195112 PMC: 3243127.


IntelliGO: a new vector-based semantic similarity measure including annotation origin.

Benabderrahmane S, Smail-Tabbone M, Poch O, Napoli A, Devignes M BMC Bioinformatics. 2010; 11:588.

PMID: 21122125 PMC: 3098105. DOI: 10.1186/1471-2105-11-588.


Predicting novel human gene ontology annotations using semantic analysis.

Done B, Khatri P, Done A, Draghici S IEEE/ACM Trans Comput Biol Bioinform. 2010; 7(1):91-9.

PMID: 20150671 PMC: 3712327. DOI: 10.1109/TCBB.2008.29.