1 / 18

Learning Mult i pli c ative Intera c tions many slides from Hinton

Learning Mult i pli c ative Intera c tions many slides from Hinton. Two different meanin g s of “mu l tipl i cative”. If we t ake two dens i ty mode l s and m ulti p ly together the i r proba b i l ity distributi o ns at e ach po i nt in dat a -spac e , we get a “p r od u ct of e x perts ” .

shina
Download Presentation

Learning Mult i pli c ative Intera c tions many slides from Hinton

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Multiplicative Interactions many slides from Hinton

  2. Twodifferentmeaningsof “multiplicative” • Ifwetaketwo densitymodelsandmultiplytogether their probabilitydistributionsat each pointindata-space,we geta “productofexperts”. • The product of twoGaussianexpertsis a Gaussian. • Ifwetaketwo variablesandwemultiplythem together to provideinputto athirdvariablewegeta“multiplicative interaction”. • The distributionof theproduct of twoGaussian- distributed variablesis NOT Gaussiandistributed. Itis aheavy-taileddistribution.One Gaussiandetermines thestandarddeviationoftheother Gaussian. • Heavy-taileddistributionsare thesignaturesof multiplicativeinteractionsbetweenlatentvariables.

  3. Learningmultiplicativeinteractions • Itis fairly easy tolearnmultiplicativeinteractionsifallof thevariablesareobserved. • Thisis possibleifwecontrol thevariablesusedto createa trainingset (e.g.pose,lighting,identity…) • Itis alsoeasy to learnenergy-basedmodelsinwhichall butoneof thetermsineachmultiplicativeinteractionare observed. • Inferenceis stilleasy. • Ifmorethan one of the termsineachmultiplicative interactionare unobserved,the interactionsbetween hiddenvariablesmakeinferencedifficult. • Alternating Gibbscan be usedifthelatentvariables forma bi-partite graph.

  4. HigherorderBoltzmannmachines • (Sejnowski,~1986) • The usual energyfunction is quadraticinthestates: • Butwecoulduse higherorder interactions: • Hiddenunithacts as a switch. When h is on, it switches in the pairwiseinteractionbetweenunit i andunit j. • –Unitsi andj can alsobe viewedas switchesthat controlthe pairwiseinteractionsbetweenj and h orbetweeni and h.

  5. Usinghigher-orderBoltzmannmachines to modelimagetransformations • (Memisevic and Hinton,2007) • A globaltransformationspecifieswhichpixel • goestowhichotherpixel. • Conversely,eachpairofsimilarintensitypixels, oneineach image,votes for aparticularglobal transformation. imagetransformation image(t) image(t+1)

  6. Usinghigher-orderBoltzmannmachines to modelimagetransformations • For binary images, a simple energy function that captures all possible correlations between the components of is • Using this energy function, we can now define the joint distribution over outputs and hidden variables by exponentiating and normalizing: • 其中, • From Eqs. 1 and 2, we get (1) (2)

  7. Makingthereconstructioneasier • Conditiononthefirst imagesothat onlyone visible • groupneedsto be reconstructed. • –Giventhehiddenstatesandthepreviousimage, thepixelsin thesecondimageare conditionally independent. • imagetransformation image(t) image(t+1)

  8. Themainproblem with 3-way interactions • energy function: • Thereare far toomanyof them. • Wecanreducethe numberin severalstraight-forwardways: • Dodimensionalityreductiononeachgroup before thethreewayinteractions. • Usespatiallocalityto limitthe range of the three-wayinteractions. • Amuchmoreinterestingapproach(whichcanbe combinedwith theother two)istofactor the interactionssothat theycan be specifiedwith fewer parameters. • Thisleadstoa noveltypeof learningmodule.

  9. Factoringthree-wayinteractions unfactored factored • Weusefactorsthatcorrespondto3-wayouter- • products. Esisjsh wijh i,j,h Esisjsh wifwjfwhff i,j,h wjf whf wif

  10. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images • (Ranzato, Krizhevskyand Hinton,2010) • Joint 3-way model • Model the covariance structure of natural images. The visible units are two identical copies

  11. A powerful module for deep learning • Define energy function in terms of 3-way multiplicative interactions between two visible binary units, , and one hidden binary unit : • Model the three-way weights as a sum of “factors”, f, each of which is a three-way outer product • The factors are connected twice to the same image through matrices B and C, it is natural to tie their weights further reducing the number of parameters:

  12. A powerful module for deep learning • So the energy function becomes: • The parameters of the model can be learned by maximizing the log likelihood, whose gradient is given by: • The hidden units conditionally independent given the states of the visible units, and their binary states can be sampled using: • However, given the hidden states, the visible units are no longer independent.

  13. Producing reconstructions using hybrid Monte Carlo • Integrate out the hidden units and use the hybrid Monte Carlo algorithm(HMC) on free energy:

  14. Modeling the joint density of two images under a variety of tranformations • (Hinton et al.,2011) • describe a generative model of the relationship between two images • The model is defined as a factored three-way Boltzmann machine, in which hidden variables collaborate to define the joint correlation matrix for image pairs

  15. Model • Given two real-valued images and , define the matching score of triplets : • Add bias terms to matching score and get energy function: • (1) • Exponentiate and normalize energy function:

  16. Model • Marginalize over to get distribution over an image pair : • And the we can get • (3) • (4) • (5) • This shows that among the three sets of variables, computation of the conditional distribution of any one group , given the other two, is tractable.

  17. 先决条件:数据集,学习率 repeat for from 1 to do 计算 令 for each 执行正阶段更新 从中采样 从中采样 ifthen 从中采样,令 从中采样,令 else 从中采样,令 从中采样,令 End if 令 for each 计算 执行负阶段更新 重新正则化 end for until达到收敛条件 Three-way contrastive Divergence

  18. Thank you

More Related