Training Products of Experts by Minimizing Contrastive Divergence

Overview

Journal Neural Comput

Publisher MIT Press

Specialty Medical Informatics

Date 2002 Aug 16

PMID 12180402

Citations 218

Authors

Geoffrey E Hinton

Affiliations

Soon will be listed here.

Abstract

It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

Citing Articles

Temporal Contrastive Learning through implicit non-equilibrium memory.

Falk M, Strupp A, Scellier B, Murugan A Nat Commun. 2025; 16(1):2163.

PMID: 40038254 PMC: 11880436. DOI: 10.1038/s41467-025-57043-x.

A survey on gait recognition against occlusion: taxonomy, dataset and methodology.

Li T, Ma W, Zheng Y, Fan X, Yang G, Wang L PeerJ Comput Sci. 2025; 10:e2602.

PMID: 39896378 PMC: 11784899. DOI: 10.7717/peerj-cs.2602.

AI-driven multi-omics integration for multi-scale predictive modeling of genotype-environment-phenotype relationships.

Wu Y, Xie L Comput Struct Biotechnol J. 2025; 27:265-277.

PMID: 39886532 PMC: 11779603. DOI: 10.1016/j.csbj.2024.12.030.

Investigating the intrinsic top-down dynamics of deep generative models.

Tausani L, Testolin A, Zorzi M Sci Rep. 2025; 15(1):2875.

PMID: 39843473 PMC: 11754800. DOI: 10.1038/s41598-024-85055-y.

Parameter Estimation Procedures for Exponential-family Random Graph Models on Count-valued Networks: A Comparative Simulation Study.

Huang P, Butts C Soc Networks. 2025; 76():51-67.

PMID: 39830820 PMC: 11741508. DOI: 10.1016/j.socnet.2023.07.001.