Learning Mult i pli c ative Intera c tions many slides from Hinton

Learning Multiplicative Interactions many slides from Hinton

Twodifferentmeaningsof “multiplicative” • Ifwetaketwo densitymodelsandmultiplytogether their probabilitydistributionsat each pointindata-space,we geta “productofexperts”. • The product of twoGaussianexpertsis a Gaussian. • Ifwetaketwo variablesandwemultiplythem together to provideinputto athirdvariablewegeta“multiplicative interaction”. • The distributionof theproduct of twoGaussian- distributed variablesis NOT Gaussiandistributed. Itis aheavy-taileddistribution.One Gaussiandetermines thestandarddeviationoftheother Gaussian. • Heavy-taileddistributionsare thesignaturesof multiplicativeinteractionsbetweenlatentvariables.

Learningmultiplicativeinteractions • Itis fairly easy tolearnmultiplicativeinteractionsifallof thevariablesareobserved. • Thisis possibleifwecontrol thevariablesusedto createa trainingset (e.g.pose,lighting,identity…) • Itis alsoeasy to learnenergy-basedmodelsinwhichall butoneof thetermsineachmultiplicativeinteractionare observed. • Inferenceis stilleasy. • Ifmorethan one of the termsineachmultiplicative interactionare unobserved,the interactionsbetween hiddenvariablesmakeinferencedifficult. • Alternating Gibbscan be usedifthelatentvariables forma bi-partite graph.

HigherorderBoltzmannmachines • (Sejnowski,~1986) • The usual energyfunction is quadraticinthestates: • Butwecoulduse higherorder interactions: • Hiddenunithacts as a switch. When h is on, it switches in the pairwiseinteractionbetweenunit i andunit j. • –Unitsi andj can alsobe viewedas switchesthat controlthe pairwiseinteractionsbetweenj and h orbetweeni and h.

Usinghigher-orderBoltzmannmachines to modelimagetransformations • (Memisevic and Hinton,2007) • A globaltransformationspecifieswhichpixel • goestowhichotherpixel. • Conversely,eachpairofsimilarintensitypixels, oneineach image,votes for aparticularglobal transformation. imagetransformation image(t) image(t+1)

Usinghigher-orderBoltzmannmachines to modelimagetransformations • For binary images, a simple energy function that captures all possible correlations between the components of is • Using this energy function, we can now define the joint distribution over outputs and hidden variables by exponentiating and normalizing: • 其中， • From Eqs. 1 and 2, we get (1) (2)

Makingthereconstructioneasier • Conditiononthefirst imagesothat onlyone visible • groupneedsto be reconstructed. • –Giventhehiddenstatesandthepreviousimage, thepixelsin thesecondimageare conditionally independent. • imagetransformation image(t) image(t+1)

Themainproblem with 3-way interactions • energy function: • Thereare far toomanyof them. • Wecanreducethe numberin severalstraight-forwardways: • Dodimensionalityreductiononeachgroup before thethreewayinteractions. • Usespatiallocalityto limitthe range of the three-wayinteractions. • Amuchmoreinterestingapproach(whichcanbe combinedwith theother two)istofactor the interactionssothat theycan be specifiedwith fewer parameters. • Thisleadstoa noveltypeof learningmodule.

Factoringthree-wayinteractions unfactored factored • Weusefactorsthatcorrespondto3-wayouter- • products. Esisjsh wijh i,j,h Esisjsh wifwjfwhff i,j,h wjf whf wif

Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images • (Ranzato, Krizhevskyand Hinton,2010) • Joint 3-way model • Model the covariance structure of natural images. The visible units are two identical copies

A powerful module for deep learning • Define energy function in terms of 3-way multiplicative interactions between two visible binary units, , and one hidden binary unit : • Model the three-way weights as a sum of “factors”, f, each of which is a three-way outer product • The factors are connected twice to the same image through matrices B and C, it is natural to tie their weights further reducing the number of parameters:

A powerful module for deep learning • So the energy function becomes: • The parameters of the model can be learned by maximizing the log likelihood, whose gradient is given by: • The hidden units conditionally independent given the states of the visible units, and their binary states can be sampled using: • However, given the hidden states, the visible units are no longer independent.

Producing reconstructions using hybrid Monte Carlo • Integrate out the hidden units and use the hybrid Monte Carlo algorithm(HMC) on free energy:

Modeling the joint density of two images under a variety of tranformations • (Hinton et al.,2011) • describe a generative model of the relationship between two images • The model is defined as a factored three-way Boltzmann machine, in which hidden variables collaborate to define the joint correlation matrix for image pairs

Model • Given two real-valued images and , define the matching score of triplets : • Add bias terms to matching score and get energy function: • (1) • Exponentiate and normalize energy function:

Model • Marginalize over to get distribution over an image pair : • And the we can get • (3) • (4) • (5) • This shows that among the three sets of variables, computation of the conditional distribution of any one group , given the other two, is tractable.

先决条件：数据集，学习率 repeat for from 1 to do 计算令 for each 执行正阶段更新从中采样从中采样 ifthen 从中采样，令从中采样，令 else 从中采样，令从中采样，令 End if 令 for each 计算执行负阶段更新重新正则化 end for until达到收敛条件 Three-way contrastive Divergence

Thank you

Learning Mult i pli c ative Intera c tions many slides from Hinton