130 likes | 313 Views
Topics in Machine Learning. 4 th lecture: Perceptron. Definition. motivated by the biological neuron:. x 1. x 2. x 3. x n. Definition. the basic model. weights. w 1. threshold/bias. w 2. w t x. b. H(w t x - b). w 3. activation. . w n.
E N D
Topics in Machine Learning 4th lecture: Perceptron Topics in Machine Learning
Definition motivated by the biological neuron: Topics in Machine Learning
x1 x2 x3 xn Definition the basic model weights w1 threshold/bias w2 wtx b H(wtx - b) w3 activation ... wn H = perceptron function = Heaviside function Topics in Machine Learning
Definition geometry: linear separation boundary w b/w1 b/w2 Topics in Machine Learning
Task • learn a binary classification f:ℝn{0,1} • given examples (x,y) in ℝnx{0,1}, positive/negative examples • evaluation: mean number of misclassifications on a test set Topics in Machine Learning
Some basics • we can simulate the bias by an on-neuron: H(wtx-b)=H((w,-b)t(x,1)-0) • for any finite set, we can assume that no point lies on the boundary • we can assume that a solution classifies all points correctly with margin 1: margin = minx |wtx| we know: |wtx| ≥ε, hence |(w/ε)tx|≥1 w’ Topics in Machine Learning
Perceptron learning algorithm Rosenblatt, 1962 • simulate the bias as on-neuron • define the error signal init w; repeat while some x with δ(w,x)≠0 exists: w := w + δ(w,x)∙x; example blackboard Hebbian learning Topics in Machine Learning
General Hebbian learning: ( Psychology, D.O.Hebb) increase the connection strength for similar signals and decrease the strength for dissimilar signals weight adaptation for the perceptron learning rule for misclassified examples: w := w + δ(w,x)∙x; Topics in Machine Learning
Perceptron convergence theorem Theorem: The perceptron algorithm converges after a finite number of steps if a solution exists. Proof: Assume w* is a solution with |w*tx|≥1 for all x. Denote by wk the weights in the kth step of the algorithm. Show by induction: w*twk ≥ w*tw0 + k (scalar product with solution becomes larger) |wk|2 ≤ |w0|2 + k max |x|2 (length is restricted) blackboard Hence: w*tw0 + k ≤ w*twk ≤ |w*||wk| ≤ |w*| (|w0|2 + k max|x|2)1/2 Cauchy-Schwartz Topics in Machine Learning
Perceptron convergence theorem This yields two graphs: w*tw0 + k |w*| (|w0|2 + k max|x|2)1/2 algorithm converged k Topics in Machine Learning
Perceptron - theory For a solvable training problem: • the perceptron algorithm converges, • the number of steps can be exponential, • alternative formulation: linear programming (find x which solves Ax≤b) polynomial algorithms exist (Khachiyan/Karmakar algorithm; in the mean, also the simplex method is good) • generalization ability: scales with the input dimension ( learning theory, later session) Only linearly separable problems can be solved with the perceptron linear classification boundary. Topics in Machine Learning
Perceptron - theory Problems which are not linearly separable: • e.g. XOR • the perceptron algorithm cannot find a solution, but a cycle will be observed (perceptron-cycle theorem, i.e. the same weight will be observed twice during the algorithm) • a solution as good as possible is found if the examples are chosen randomly after some time pocket algorithm: store the best solution • finding an optimum solution in the presence of errors is NP-hard (can even not be approximated with respect to any given constant) example blackboard Topics in Machine Learning
Perceptron - history 43: McCulloch/Pitts: propose artificial neurons and show the universal computation ability for circuits of neurons 49: Hebb paradigm proposed 58: Rosenblatt-perceptron (= perceptron + fixed preprocessing with masks), learning algorithm, used for picture recognition 60: Widrow/Hoff: Adaline, company: Memistor-Corporation 69: Minsky/Papert: show the restrictions of the Rosenblatt-perceptron with respect to its representational abilities we need more powerful systems Werbos: Backpropagation, Vapnik: Support Vector Machine Topics in Machine Learning