1 / 33

Introduction to Neural Networks

This chapter delves into various neural network optimization problems such as the Travelling Salesperson Problem, map coloring, and job shop scheduling. It explores architectures like Boltzmann Machine, Cascade Correlation, and Neocognitron along with their algorithms and training methods. The advantages of using neural networks for optimization tasks are highlighted, emphasizing their ability to find near-optimal solutions and handle weak constraints. The text covers topics like adaptive architectures, extensions with Hebbian learning, Boltzmann Machines with learning, and Simple Recurrent Nets. Efficient optimization solutions and innovative neural network structures are presented to tackle complex problems.

annleslie
Download Presentation

Introduction to Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Neural Networks John Paxton Montana State University Summer 2003

  2. Chapter 7: A Sampler Of Other Neural Nets • Optimization Problems • Common Extensions • Adaptive Architectures • Neocognitron

  3. I. Optimization Problems • Travelling Salesperson Problem. • Map coloring. • Job shop scheduling. • RNA secondary structure.

  4. Advantages of Neural Nets • Can find near optimal solutions. • Can handle weak (desirable, but not required) constraints.

  5. TSP Topology • Each row has 1 unit that is on • Each column has 1 unit that is on 1st 2nd 3rd City A City B City C

  6. Boltzmann Machine • Hinton, Sejnowski (1983) • Can be modelled using Markov chains • Uses simulated annealing • Each row is fully interconnected • Each column is fully interconnected

  7. Architecture • ui,j connected to uk,j+1 with –di,k • ui1 connected to ukn with -dik b -p U11 U1n Un1 Unn

  8. Algorithm 1. Initialize weights b, p p > b p > greatest distance between cities Initialize temperature T Initialize activations of units to random binary values

  9. Algorithm 2. while stopping condition is false, do steps 3 – 8 3. do steps 4 – 7 n2 times (1 epoch) 4. choose i and j randomly 1 <= i, j <= n uij is candidate to change state

  10. Algorithm 5. Compute c = [1 – 2uij]b + S S ukm (-p) where k <> i, m <> j 6. Compute probability to accept change a = 1 / (1 + e(-c/T) ) 7. Accept change if random number [0..1] < a. If change, uij = 1 – uij 8. Adjust temperature T = .95T

  11. Stopping Condition • No state change for a specified number of epochs. • Temperature reaches a certain value.

  12. Example • T(0) = 20 • ½ units are on initially • b = 60 • p = 70 • 10 cities, all distances less than 1 • 200 or fewer epochs to find stable configuration in 100 random trials

  13. Other Optimization Architectures • Continuous Hopfield Net • Gaussian Machine • Cauchy Machine • Adds noise to input in attempt to escape from local minima • Faster annealing schedule can be used as a consequence

  14. II. Extensions • Modified Hebbian Learning • Find parameters for optimal surface fit of training patterns

  15. Boltzmann Machine With Learning • Add hidden units • 2-1-2 net below could be used for simple encoding/decoding (data compression) y1 x1 z1 x2 y2

  16. Simple Recurrent Net • Learn sequential or time varying patterns • Doesn’t necessarily have steady state output • input units • context units • hidden units • output units

  17. Architecture c1 x1 z1 y1 xn zp ym cp

  18. Simple Recurrent Net • f(ci(t)) = f(zi(t-1)) • f(ci(0)) = 0.5 • Can use backpropagation • Can learn string of characters

  19. Example: Finite State Automaton • 4 xi • 4 yi • 2 zi • 2 ci A BEGIN END B

  20. Backpropagation In Time • Rumelhart, Williams, Hinton (1986) • Application: Simple shift register 1 (fixed) x1 y1 x1 z1 x2 y2 x2 1 (fixed)

  21. Backpropagation Training for Fully Recurrent Nets • Adapts backpropagation to arbitrary connection patterns.

  22. III. Adaptive Architectures • Probabilistic Neural Net (Specht 1988) • Cascade Correlation (Fahlman, Lebiere 1990)

  23. Probabilistic Neural Net • Builds its own architecture as training progresses • Chooses class A over class B if hAcAfA(x) > hBcBfB(x) • cA is the cost of classifying an example as belonging to A when it belongs to B • hA is the a priori probability of an example belonging to class A

  24. Probabilistic Neural Net • fA(x) is the probability density function for class A, fA(x) is learned by the net • zA1: pattern unit, fA: summation unit zA1 fA x1 zAj y zB1 fB xn zBk

  25. Cascade Correlation • Builds own architecture while training progresses • Tries to overcome slow rate of convergence by other neural nets • Dynamically adds hidden units (as few as possible) • Trains one layer at a time

  26. Cascade Correlation • Stage 1 x0 y1 x1 y2 x2

  27. Cascade Correlation • Stage 2 (fix weights into z1) x0 y1 x1 z1 y2 x2

  28. Cascade Correlation • Stage 3 (fix weights into z2) x0 y1 z1 z2 x1 y2 x2

  29. Algorithm 1. Train stage 1. If error is not acceptable, proceed. 2. Train stage 2. If error is not acceptable, proceed. 3. Etc.

  30. IV. Neocognitron • Fukushima, Miyako, Ito (1983) • Many layers, hierarchical • Very spare and localized connections • Self organizing • Supervised learning, layer by layer • Recognizes handwritten 0, 1, 2, 3, … 9, regardless of position and style

  31. Architecture

  32. Architecture • S layers respond to patterns • C layers combine results, use larger field of view • For example S11 responds to 0 0 0 1 1 1 0 0 0

  33. Training • Progresses layer by layer • S1 connections to C1 are fixed • C1 connections to S2 are adaptable • A V2 layer is introduced between C1 and S2, V2 is inhibatory • C1 to V2 connections are fixed • V2 to S2 connections are adaptable

More Related