280 likes | 327 Views
ANNs (Artificial Neural Networks). The perceptron. Perceptron. X is a (column) vector of inputs. W is a (column) vector of weights. or w 0 is the bias or threshold weight. Perceptron. Usage: training/learning l ocal vs. global minima supervised vs. unsupervised
E N D
Perceptron • X is a (column) vector of inputs. • W is a (column) vector of weights. • or w0 is the bias or threshold weight.
Perceptron Usage: • training/learning • local vs. global minima • supervised vs. unsupervised • feedforward (or testing or usage or application) • Indicate class i if output g(X)>0; not class i otherwise.
Perceptron The bias or threshold can be represented as simply another weight (w0) with a constant input of 1 (x0=1).
Perceptron • This is the dot product of two vectors, W and X.
Activation functions • linear • threshold • sigmoid • tanh
Perceptron training/learning • Initialize weights (including thresholds, w0) to random numbers in [-0.5,…,+0.5] (uniformly distributes). • Select training input vector, X, and desired output, d. • Calculate actual output, y. • Learn. (Only perform this step when output is incorrect.) where is in [0..1] and is the gain or learning rate.
“The proof of the perceptron learning theorem (Rosenblatt 1962) demonstrated that a perceptron could learn anything it could represent.” [3] • So what can it represent? • Any problem that is linearly separable. • Are all problems linear separable?
Consider a binary function of n binary inputs. • “A neuron with n binary inputs can have 2n different input patterns, consisting of ones and zeros. Because each input pattern can produce two different binary outputs, 1 and 0, there are 22n different functions of n variables.” [3] • How many of these are separable? Not many! For n=6, 226 = 1.8x1019 but only 5,028,134 are linearly separable. • AND and OR are linearly separable but XOR is not!
How about adding additional layers? • “Multilayer networks provide no increase in computational power over a single-layer network unless there is a nonlinear function between layers.” [3] • That’s because matrix multiplication is associative. • (XW1)W2 = X(W1W2)
“It is natural to ask if every decision can be implemented by such a three-layer network … The answer, due ultimately to Kolmogorov …, is “yes” – any continuous function from input to output can be implemented in a three-layer net, given sufficient number of hidden units, proper nonlinearities, and weights.” [1]
Note threshold nodesfrom “Usefulness of artificial neural networks to predict follow-up dietary protein intake in hemodialysis patients”http://www.nature.com/ki/journal/v66/n1/fig_tab/4494599f1.html
References • R.O. Duda, P.E. Hart, D.G. Stork, “Pattern Classification,” John Wiley and Sons, 2001. • S.K. Rogers, M. Kabrisky, “An introduction to biological and artificial neural networks for pattern recognition,” SPIE Optical Engineering Press, 1991. • P.D. Wasserman, “Neural computing,” Van Nostrand Reinhold, 1989.