1 / 11

CS623: Introduction to Computing with Neural Nets (lecture-19)

CS623: Introduction to Computing with Neural Nets (lecture-19). Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay. To learn the identity function The setting is probabilistic, x = 1 or x = -1, with uniform probability, i.e. , P(x=1) = 0.5, P(x=-1) = 0.5

Download Presentation

CS623: Introduction to Computing with Neural Nets (lecture-19)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS623: Introduction to Computing with Neural Nets(lecture-19) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

  2. To learn the identity function The setting is probabilistic, x = 1 or x = -1, with uniform probability, i.e., P(x=1) = 0.5, P(x=-1) = 0.5 For, x=1, y=1 with P=0.9 For, x=-1, y=-1 with P=0.9 Illustration of the basic idea of Boltzmann Machine y x 2 1 w12 y x 1 1 -1 -1

  3. Illustration of the basic idea of Boltzmann Machine (contd.) • Let α = output neuron states β = input neuron states Pα|β = observed probability distribution Qα|β = desired probability distribution Qβ = probability distribution on input states β

  4. Illustration of the basic idea of Boltzmann Machine (contd.) • The divergence D is given as: D = ∑α∑β Qα|β Qβln Qα|β / Pα|β called KL divergence formula D = ∑α∑β Qα|β Qβln Qα|β / Pα|β >= ∑α∑β Qα|β Qβ( 1 - Pα|β /Qα|β) >= ∑α∑βQα|β Qβ- ∑α∑βPα|β Qβ >= ∑α∑βQαβ - ∑α∑βPαβ {Qαβ and Pαβare joint distributions} >= 1 – 1 = 0

  5. Gradient descent for finding the weight change rule P(Sα) αexp(-E(Sα)/T) P(Sα) = (exp(-E(Sα)/T)) / (∑βє all statesexp(-E(Sβ)/T) ln(P(Sα))= (-E(Sα)/T)-ln Z D= ∑α∑βQα|β Qβln (Qα|β / Pα|β) Δwij= η (δD/δwij); gradient descent

  6. Calculating gradient: 1/2 δD / δwij = δ/δwij [∑α∑β Qα|β Qβln (Qα|β / Pα|β)] = δ/δwij [∑α∑β Qα|β Qβln Qα|β - ∑α∑β Qα|β Qβln Pα|β] δ(ln Pα|β)/δwij = δ/δwij [-E(Sα)/T– lnZ] Z = ∑βexp(-E(Sβ))/T Constant With respect To wij

  7. Calculating gradient: 2/2 δ [-E(Sα)/T]/δwij = (-1/T) δ/δwij [ - ∑i∑j>iwij si sj ] = (-1/T)[-sisj|α] = (1/T)[sisj|α] δ (ln Z)/δwij=(1/Z)(δZ/δwij) Z= ∑βexp(-E(Sβ)/T) δZ/δwij= ∑β[exp(-E(Sβ)/T)(δ(-E(Sβ/T)/δwij )] = (1/T) ∑βexp(-E(Sβ)/T).sisj|β

  8. Final formula for Δwij Δwij= [1/T/ [sisj|α– (1/Z) ∑βexp(-E(Sβ)/T).sisj|β = [1/T][sisj|α– ∑βP(Sβ).sisj|β] Expectation of ith and jth Neurons being on together

  9. Issue of Hidden Neurons • Boltzmann machines • can come with hidden neurons • are equivalent to a Markov Random field • with hidden neurons are like a Hidden Markov Machines • Training a Boltzmann machine is equivalent to running the Expectation Maximization Algorithm

  10. Use of Boltzmann machine • Computer Vision • Understanding scene involves what is called “Relaxation Search” which gradually minimizes a cost function with progressive relaxation on constraints • Boltzmann machine has been found to be slow in the training • Boltzmann training is NP-hard.

  11. Questions • Does the Boltzmann machine reach the global minimum? What ensures it? • Why is simulated annealing applied to Boltzmann machine? • local minimum  increase T  n/w runs gradually reduce T  reach global minimum. • Understand the effect of varying T • Higher T  small difference in energy states ignored, convergence to local minimum fast.

More Related