370 likes | 381 Views
Explore pattern association and memory in neural networks covering autoassociative and heteroassociative networks, Hebb rule, Delta rule, activation functions, storage capacity, and practical examples. Learn through exercises and examples to enhance understanding.
E N D
Introduction to Neural Networks John Paxton Montana State University Summer 2003
Chapter 3: Pattern Association • Aristotle’s observed that human memory associates • similar items • contrary items • items close in proximity • items close in succession (a song)
Terminology and Issues • Autoassociative Networks • Heteroassociative Networks • Feedforward Networks • Recurrent Networks • How many patterns can be stored?
Hebb Rule for Pattern Association • Architecture w11 x1 y1 xn ym wnm
Algorithm 1. set wij = 0 1 <= i <= n, 1 <= j <= m 2. for each training pair s:t 3. xi = si 4. yj = tj 5. wij(new) = wij(old) + xiyj
Example • s1 = (1 -1 -1), s2 = (-1 1 1) • t1 = (1 -1), t2 = (-1 1) • w11 = 1*1 + (-1)(-1) = 2 • w12 = 1*(-1) + (-1)1 = -2 • w21 = (-1)1+ 1(-1) = -2 • w22 = (-1)(-1) + 1(1) = 2 • w31 = (-1)1 + 1(-1) = -2 • w32 = (-1)(-1) + 1*1 = 2
Matrix Alternative • s1 = (1 -1 -1), s2 = (-1 1 1) • t1 = (1 -1), t2 = (-1 1) 1 -1 1 -1 2 -2 -1 1 -1 1 = -2 2 -1 1 -2 2
Final Network • f(yin) = 1 if yin > 0, 0 if yin = 0, else -1 2 x1 y1 -2 -2 x2 2 y2 -2 x3 2
Properties • Weights exist if input vectors are linearly independent • Orthogonal vectors can be learned perfectly • High weights imply strong correlations
Exercises • What happens if (-1 -1 -1) is tested? This vector has one mistake. • What happens if (0 -1 -1) is tested? This vector has one piece of missing data. • Show an example of training data that is not learnable. Show the learned network.
Delta Rule for Pattern Association • Works when patterns are linearly independent but not orthogonal • Introduced in the 1960s for ADALINE • Produces a least squares solution
Activation Functions • Delta Rule (1) wij(new) = wij(old) + a(tj – yj)*xi*1 • Extended Delta Rule (f’(yin.j)) wij(new) = wij(old) + a(tj – yj)*xi*f’(yin.j)
Heteroassociative Memory Net • Application: Associate characters.A <-> aB <-> b
Autoassociative Net • Architecture w11 x1 y1 xn yn wnn
Training Algorithm • Assuming that the training vectors are orthogonal, we can use the Hebb rule algorithm mentioned earlier. • Application: Find out whether an input vector is familiar or unfamiliar. For example, voice input as part of a security system.
Autoassociate Example 1 1 1 1 1 1 1 0 1 1 1 = 1 1 1 = 1 0 1 1 1 1 1 1 1 0
Evaluation • What happens if (1 1 1) is presented? • What happens if (0 1 1) is presented? • What happens if (0 0 1) is presented? • What happens if (-1 1 1) is presented? • What happens if (-1 -1 1) is presented? • Why are the diagonals set to 0?
Storage Capacity • 2 vectors (1 1 1), (-1 -1 -1) • Recall is perfect 1 -1 1 1 1 0 2 2 1 -1 -1 -1 -1 = 2 0 2 1 -1 2 2 0
Storage Capacity • 3 vectors: (1 1 1), (-1 -1 -1), (1 -1 1) • Recall is no longer perfect 1 -1 1 1 1 1 0 1 3 1 -1 -1 -1 -1 -1 = 1 0 1 1 -1 1 1 -1 1 3 1 0
Theorem • Up to n-1 bipolar vectors of n dimensions can be stored in an autoassociative net.
Iterative Autoassociative Net • 1 vector: s = (1 1 -1) • st * s = 0 1 -1 1 0 -1 -1 -1 0 • (1 0 0) -> (0 1 -1) • (0 1 -1) -> (2 1 -1) -> (1 1 -1) • (1 1 -1) -> (2 2 -2) -> (1 1 -1)
Testing Procedure 1. initialize weights using Hebb learning 2. for each test vector do 3. set xi = si 4. calculate ti 5. set si = ti 6. go to step 4 if the s vector is new
Exercises • 1 piece of missing data: (0 1 -1) • 2 pieces of missing data: (0 0 -1) • 3 pieces of missing data: (0 0 0) • 1 mistake: (-1 1 -1) • 2 mistakes: (-1 -1 -1)
Discrete Hopfield Net • content addressable problems • pattern association problems • constrained optimization problems • wij = wji • wii = 0
Characteristics • Only 1 unit updates its activation at a time • Each unit continues to receive the external signal • An energy (Lyapunov) function can be found that allows the net to converge, unlike the previous system • Autoassociative
Architecture x2 y2 y1 y3 x1 x3
Algorithm 1. initialize weights using Hebb rule 2. for each input vector do 3. yi = xi 4. do steps 5-6 randomly for each yi 5. yin.i = xi + Syjwji 6. calculate f(yin.i) 7. go to step 2 if the net hasn’t converged
Example • training vector: (1 -1) y1 y2 -1 x1 x2
Example • input (0 -1) update y1 = 0 + (-1)(-1) = 1 update y2 = -1 + 1(-1) = -2 -> -1 • input (1 -1) update y2 = -1 + 1(-1) = -2 -> -1 update y1 = 1 + -1(-1) = 2 -> 1
Hopfield Theorems • Convergence is guaranteed. • The number of storable patterns is approximately n / (2 * log n) where n is the dimension of a vector
Bidirectional Associative Memory (BAM) • Heteroassociative Recurrent Net • Kosko, 1988 • Architecture x1 y1 ym xn
Activation Function • f(yin) = 1, if yin > 0 • f(yin) = 0, if yin = 0 • f(yin) = -1 otherwise
Algorithm 1. initialize weights using Hebb rule 2. for each test vector do 3. present s to x layer 4. present t to y layer 5. while equilibrium is not reached 6. compute f(yin.j) 7. compute f(xin.j)
Example • s1 = (1 1), t1 = (1 -1) • s2 = (-1 -1), t2 = (-1 1) 1 -1 1 -1 2 -2 1 -1 -1 1 2 -2
Example • Architecture 2 x1 y1 -2 2 y2 x2 -2 present (1 1) to x -> 1 -1 present (1 -1) to y -> 1 1
Hamming Distance • Definition: Number of different corresponding bits in two vectors • For example, H[(1 -1), (1 1)] = 1 • Average Hamming Distance is ½.
About BAMs • Observation: Encoding is better when the average Hamming distance of the inputs is similar to the average Hamming distance of the outputs. • The memory capacity of a BAM is min(n-1, m-1).