» Articles » PMID: 12386007

The Mutual Information: Detecting and Evaluating Dependencies Between Variables

Overview
Journal Bioinformatics
Specialty Biology
Date 2002 Oct 19
PMID 12386007
Citations 206
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Clustering co-expressed genes usually requires the definition of 'distance' or 'similarity' between measured datasets, the most common choices being Pearson correlation or Euclidean distance. With the size of available datasets steadily increasing, it has become feasible to consider other, more general, definitions as well. One alternative, based on information theory, is the mutual information, providing a general measure of dependencies between variables. While the use of mutual information in cluster analysis and visualization of large-scale gene expression data has been suggested previously, the earlier studies did not focus on comparing different algorithms to estimate the mutual information from finite data.

Results: Here we describe and review several approaches to estimate the mutual information from finite datasets. Our findings show that the algorithms used so far may be quite substantially improved upon. In particular when dealing with small datasets, finite sample effects and other sources of potentially misleading results have to be taken into account.

Citing Articles

Differential Transcriptional Programs Reveal Modular Network Rearrangements Associated with Late-Onset Alzheimer's Disease.

Perez-Gonzalez A, Anda-Jauregui G, Hernandez-Lemus E Int J Mol Sci. 2025; 26(5).

PMID: 40076979 PMC: 11900169. DOI: 10.3390/ijms26052361.


Optimizing functional brain network analysis by incorporating nonlinear factors and frequency band selection with machine learning models.

Hu K, Zhong B, Tian R, Yao J Medicine (Baltimore). 2025; 104(9):e41667.

PMID: 40020107 PMC: 11875576. DOI: 10.1097/MD.0000000000041667.


ISCAZIM: Integrated statistical correlation analysis for zero-inflated microbiome data.

Fan Z, Lv J, Zhang S, Gu B, Wang C, Zhang T Heliyon. 2025; 11(1):e41184.

PMID: 39811376 PMC: 11730854. DOI: 10.1016/j.heliyon.2024.e41184.


Conformational dynamics and multi-modal interaction of Paxillin with the Focal Adhesion Targeting Domain.

Bhattacharya S, He Y, Chen Y, Mohanty A, Grishaev A, Kulkarni P bioRxiv. 2025; .

PMID: 39803547 PMC: 11722443. DOI: 10.1101/2025.01.01.630265.


PyNetCor: a high-performance Python package for large-scale correlation analysis.

Long S, Xia Y, Liang L, Yang Y, Xie H, Wang X NAR Genom Bioinform. 2024; 6(4):lqae177.

PMID: 39703431 PMC: 11655297. DOI: 10.1093/nargab/lqae177.