1 / 54

Regression, Artificial Neural Networks 16/03/2016

Regression, Artificial Neural Networks 16/03/2016. Regression. Regression. Supervised learning : Based on training examples, learn a modell which works fine on previously unseen examples. Regression : forecasting real values. Regression. Training dataset: {x i , r i } r i ϵ R

alorraine
Download Presentation

Regression, Artificial Neural Networks 16/03/2016

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression,Artificial Neural Networks16/03/2016

  2. Regression

  3. Regression • Supervised learning: Based on training examples, learn a modell which works fine on previously unseen examples. • Regression: forecasting real values

  4. Regression Training dataset: {xi, ri} riϵR Evaluation metric: „Least squared error”

  5. Linear regression

  6. Linear regression g(x) = w1x + w0 Its gradient is 0 if

  7. Regression variants +MLE → • Bayes • k nearest neighbur’s • mean or • distance weighted average • Decision tree • or various linear models on the leaves

  8. Regression SVM

  9. Artificial Neural Networks

  10. Artificial neural networks • Motivation: the simulation of the neuo system (human brain)’s information processing mechanisms • Structure: huge amount of densely connected, mutally operating processing units (neurons) • It learns from experiences (training instances)

  11. Some neurobiology… • Neurons have many inputs and a single output • The output is either excitedor not • The inputs from other neurons determins whether the neuron fires • Each input synapse has a weight

  12. A neuron in maths Weighted average of inputs. If the average is above a threshold T it fires (outputs 1) else its output is 0 or -1.

  13. Statistics about the human brain • #nerons: ~ 1011 • Avg. #connections per neuron: 104 • Signal sending time: 10-3 sec • Face recognition: 10-1 sec

  14. Motivation(machine learning point of view) • Goal: non-linear classification • Linear machines are not satisfactory at several real world situations • Which non-linear function family to choose? • Neural networks: latent non-linear patterns will be machine learnt

  15. Perceptron

  16. Multilayer perceptron =Neural Network Different representation at various layers

  17. Multilayer perceptron

  18. Feedforward neural networks • Connection only to the next layer • The weights of the connections (between two layers) can be changed • Activation functions are used to calculate whether the neuron fires • Three-layer network: • Input layer • Hidden layer • Output layer

  19. Network function • The networkfunction of neuron j: whereiis the index of input neurons, andwjiis theweightbetweentheneuronsi and j. • wj0 is thebias

  20. Activation function activation function isa non-linear function of the network value: yj = f(netj) (if it’d be linear, the whole network will be linear) The sign activation function: oi 1 0 Tj netj

  21. Differentiable activation functions • Enables gradient descent-based learning • The sigmoid function: 1 0 Tj netj

  22. Output layer where kis the index on the output layer and nHis the number of hidden neurons • Binary classification: sign function • Multi-class classification: a neuron for each of the classes, the argmax is predicted (discriminant function) • Regression: linear transformation

  23. y1 hidden unit calculates:  0  y1 = +1 x1 + x2 + 0.5x1OR x2 < 0  y1 = -1 - y2 represents:  0  y2 = +1 x1 + x2 -1.5x1AND x2 < 0  y2 = -1 • The output neuron:z1 = 0.7y1-0.4y2 - 1, sgn(z1) is 1 iff y1 =1, y2 = -1 (x1OR x2) AND NOT(x1AND x2)

  24. General (three-layer) feedforwardnetwork (c output unit) • The hidden units with their activation functions can express non-linear functions • The activation functions can be different at neurons (but the same one is used in practice)

  25. Universal approximation theorem Universal approximation theoremstates that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate anycontinuous functions Butthetheoremdoesnotgiveany hint onwhoto design activationfunctionsforproblems/datasets

  26. Training of neural networks(backpropagation)

  27. Training of neural networks • The network topology is given • The same activation function is used at each hidden neuron and it is given • Training = calibration of weights • on-line learning (epochs)

  28. Training of neuralnetworks • Forward propagation An input vector propagates through the network 2. Weight update (backpropagation) the weights of the network will be changed in order to decrease the difference between the predicted and gold standard values

  29. Training of neural networks we can calculate (propagate back) the error signal for each hidden neuron

  30. tkis the target (gold standard) value of output neuron k, zkis the prediction at output neuron k (k = 1, …, c)and ware the weights • Error: • backpropagation is a gradient descent algorithms • initial weights are random, then

  31. Backpropagation The error of the weights between the hidden and output layers: the error signal for output neuron k:

  32. because netk = wkty: and: The change of weights between the hidden and output layers: wkj = kyj = (tk – zk) f’ (netk)yj

  33. The gradient of the hidden units:

  34. The error signal of the hidden units: The weight change between the input and hidden layers:

  35. update the weights to k: Backpropagation Calculate the error signal for the output neurons and update the weights between the output and hidden layers output hidden input

  36. Backpropagation Calculate the error signal for hidden neurons output rejtett input

  37. updating the ones to j Backpropagation Update the weights between the input and hidden neurons output rejtett input

  38. Training of neuralnetworks w initialisedrandomly Begininit:nH; w, stoppingcritera, , m  0 do m  m + 1 xm a sampledtraininginstance wji  wji + jxi; wkj  wkj + kyj until ||J(w)|| <  return w End

  39. Stopping criteria • if the change in J(w) is smaller than the threshold • Problem: estimating the change from a single training instance. Use bigger batches for change estimation:

  40. Stopping based on the performance on a validation dataset • The usage of unseen training instances for estimating the performance of supervised learning (to avoid overfitting) • Stopping at the minimum error on the validation set

  41. Notes on backpropagation • it can be stack at local minima • In practice, the local minima is close to the global one • Multiple training starting from various randomly initalized weights might help • we can take the trained network with the minimal error (on a validation set) • there are voting schema for voting the networks

  42. Questions of network design • How many hidden neurons? • few neurons cannot learn complex patterns • too many neurons can easily overfit • validation set? • Learning rate!?

More Related