Introduction to Neural Networks

Introduction to Neural Networks John Paxton Montana State University Summer 2003

Chapter 3: Pattern Association • Aristotle’s observed that human memory associates • similar items • contrary items • items close in proximity • items close in succession (a song)

Terminology and Issues • Autoassociative Networks • Heteroassociative Networks • Feedforward Networks • Recurrent Networks • How many patterns can be stored?

Hebb Rule for Pattern Association • Architecture w11 x1 y1 xn ym wnm

Algorithm 1. set wij = 0 1 <= i <= n, 1 <= j <= m 2. for each training pair s:t 3. xi = si 4. yj = tj 5. wij(new) = wij(old) + xiyj

Example • s1 = (1 -1 -1), s2 = (-1 1 1) • t1 = (1 -1), t2 = (-1 1) • w11 = 1*1 + (-1)(-1) = 2 • w12 = 1*(-1) + (-1)1 = -2 • w21 = (-1)1+ 1(-1) = -2 • w22 = (-1)(-1) + 1(1) = 2 • w31 = (-1)1 + 1(-1) = -2 • w32 = (-1)(-1) + 1*1 = 2

Matrix Alternative • s1 = (1 -1 -1), s2 = (-1 1 1) • t1 = (1 -1), t2 = (-1 1) 1 -1 1 -1 2 -2 -1 1 -1 1 = -2 2 -1 1 -2 2

Final Network • f(yin) = 1 if yin > 0, 0 if yin = 0, else -1 2 x1 y1 -2 -2 x2 2 y2 -2 x3 2

Properties • Weights exist if input vectors are linearly independent • Orthogonal vectors can be learned perfectly • High weights imply strong correlations

Exercises • What happens if (-1 -1 -1) is tested? This vector has one mistake. • What happens if (0 -1 -1) is tested? This vector has one piece of missing data. • Show an example of training data that is not learnable. Show the learned network.

Delta Rule for Pattern Association • Works when patterns are linearly independent but not orthogonal • Introduced in the 1960s for ADALINE • Produces a least squares solution

Activation Functions • Delta Rule (1) wij(new) = wij(old) + a(tj – yj)*xi*1 • Extended Delta Rule (f’(yin.j)) wij(new) = wij(old) + a(tj – yj)*xi*f’(yin.j)

Heteroassociative Memory Net • Application: Associate characters.A <-> aB <-> b

Autoassociative Net • Architecture w11 x1 y1 xn yn wnn

Training Algorithm • Assuming that the training vectors are orthogonal, we can use the Hebb rule algorithm mentioned earlier. • Application: Find out whether an input vector is familiar or unfamiliar. For example, voice input as part of a security system.

Autoassociate Example 1 1 1 1 1 1 1 0 1 1 1 = 1 1 1 = 1 0 1 1 1 1 1 1 1 0

Evaluation • What happens if (1 1 1) is presented? • What happens if (0 1 1) is presented? • What happens if (0 0 1) is presented? • What happens if (-1 1 1) is presented? • What happens if (-1 -1 1) is presented? • Why are the diagonals set to 0?

Storage Capacity • 2 vectors (1 1 1), (-1 -1 -1) • Recall is perfect 1 -1 1 1 1 0 2 2 1 -1 -1 -1 -1 = 2 0 2 1 -1 2 2 0

Storage Capacity • 3 vectors: (1 1 1), (-1 -1 -1), (1 -1 1) • Recall is no longer perfect 1 -1 1 1 1 1 0 1 3 1 -1 -1 -1 -1 -1 = 1 0 1 1 -1 1 1 -1 1 3 1 0

Theorem • Up to n-1 bipolar vectors of n dimensions can be stored in an autoassociative net.

Iterative Autoassociative Net • 1 vector: s = (1 1 -1) • st * s = 0 1 -1 1 0 -1 -1 -1 0 • (1 0 0) -> (0 1 -1) • (0 1 -1) -> (2 1 -1) -> (1 1 -1) • (1 1 -1) -> (2 2 -2) -> (1 1 -1)

Testing Procedure 1. initialize weights using Hebb learning 2. for each test vector do 3. set xi = si 4. calculate ti 5. set si = ti 6. go to step 4 if the s vector is new

Exercises • 1 piece of missing data: (0 1 -1) • 2 pieces of missing data: (0 0 -1) • 3 pieces of missing data: (0 0 0) • 1 mistake: (-1 1 -1) • 2 mistakes: (-1 -1 -1)

Discrete Hopfield Net • content addressable problems • pattern association problems • constrained optimization problems • wij = wji • wii = 0

Characteristics • Only 1 unit updates its activation at a time • Each unit continues to receive the external signal • An energy (Lyapunov) function can be found that allows the net to converge, unlike the previous system • Autoassociative

Architecture x2 y2 y1 y3 x1 x3

Algorithm 1. initialize weights using Hebb rule 2. for each input vector do 3. yi = xi 4. do steps 5-6 randomly for each yi 5. yin.i = xi + Syjwji 6. calculate f(yin.i) 7. go to step 2 if the net hasn’t converged

Example • training vector: (1 -1) y1 y2 -1 x1 x2

Example • input (0 -1) update y1 = 0 + (-1)(-1) = 1 update y2 = -1 + 1(-1) = -2 -> -1 • input (1 -1) update y2 = -1 + 1(-1) = -2 -> -1 update y1 = 1 + -1(-1) = 2 -> 1

Hopfield Theorems • Convergence is guaranteed. • The number of storable patterns is approximately n / (2 * log n) where n is the dimension of a vector

Bidirectional Associative Memory (BAM) • Heteroassociative Recurrent Net • Kosko, 1988 • Architecture x1 y1 ym xn

Activation Function • f(yin) = 1, if yin > 0 • f(yin) = 0, if yin = 0 • f(yin) = -1 otherwise

Algorithm 1. initialize weights using Hebb rule 2. for each test vector do 3. present s to x layer 4. present t to y layer 5. while equilibrium is not reached 6. compute f(yin.j) 7. compute f(xin.j)

Example • s1 = (1 1), t1 = (1 -1) • s2 = (-1 -1), t2 = (-1 1) 1 -1 1 -1 2 -2 1 -1 -1 1 2 -2

Example • Architecture 2 x1 y1 -2 2 y2 x2 -2 present (1 1) to x -> 1 -1 present (1 -1) to y -> 1 1

Hamming Distance • Definition: Number of different corresponding bits in two vectors • For example, H[(1 -1), (1 1)] = 1 • Average Hamming Distance is ½.

About BAMs • Observation: Encoding is better when the average Hamming distance of the inputs is similar to the average Hamming distance of the outputs. • The memory capacity of a BAM is min(n-1, m-1).

Introduction to Neural Networks