» Articles » PMID: 33267337

Estimating the Mutual Information Between Two Discrete, Asymmetric Variables with Limited Samples

Overview
Journal Entropy (Basel)
Publisher MDPI
Date 2020 Dec 3
PMID 33267337
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables-the one with minimal entropy-is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences.

Citing Articles

GWLD: an R package for genome-wide linkage disequilibrium analysis.

Zhang R, Wu H, Li Y, Huang Z, Yin Z, Yang C G3 (Bethesda). 2023; 13(9).

PMID: 37431944 PMC: 10468308. DOI: 10.1093/g3journal/jkad154.


Phosphoproteomics data-driven signalling network inference: Does it work?.

Sriraja L, Werhli A, Petsalaki E Comput Struct Biotechnol J. 2023; 21:432-443.

PMID: 36618990 PMC: 9798138. DOI: 10.1016/j.csbj.2022.12.010.


On Generalized Schürmann Entropy Estimators.

Grassberger P Entropy (Basel). 2022; 24(5).

PMID: 35626564 PMC: 9141067. DOI: 10.3390/e24050680.


Inferring a Property of a Large System from a Small Number of Samples.

Hernandez D, Samengo I Entropy (Basel). 2022; 24(1).

PMID: 35052151 PMC: 8775033. DOI: 10.3390/e24010125.


Entropy Estimation Using a Linguistic Zipf-Mandelbrot-Li Model for Natural Sequences.

Back A, Wiles J Entropy (Basel). 2021; 23(9).

PMID: 34573725 PMC: 8468050. DOI: 10.3390/e23091100.


References
1.
Still S, Bialek W . How many clusters? An information-theoretic perspective. Neural Comput. 2004; 16(12):2483-506. DOI: 10.1162/0899766042321751. View

2.
Pola G, Thiele A, Hoffmann K, Panzeri S . An exact method to quantify the information transmitted by different mechanisms of correlational coding. Network. 2003; 14(1):35-60. DOI: 10.1088/0954-898x/14/1/303. View

3.
Maidana Capitan M, Kropff E, Samengo I . Information-Theoretical Analysis of the Neural Code in the Rodent Temporal Lobe. Entropy (Basel). 2020; 20(8). PMC: 7513095. DOI: 10.3390/e20080571. View

4.
Safaai H, Onken A, Harvey C, Panzeri S . Information estimation using nonparametric copulas. Phys Rev E. 2019; 98(5). PMC: 6458593. DOI: 10.1103/PhysRevE.98.053302. View

5.
Montemurro M, Senatore R, Panzeri S . Tight data-robust bounds to mutual information combining shuffling and model selection techniques. Neural Comput. 2007; 19(11):2913-57. DOI: 10.1162/neco.2007.19.11.2913. View