1 / 34

Basic Models in Theoretical Neuroscience

Basic Models in Theoretical Neuroscience. Oren Shriki 2010. Supervised Learning. Supervised Learning. The learning is supervised by a ‘teacher’. The network is exposed to samples and presents its output for each input. The teacher presents the desired outputs.

aviva
Download Presentation

Basic Models in Theoretical Neuroscience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Models in Theoretical Neuroscience Oren Shriki 2010 Supervised Learning

  2. Supervised Learning • The learning is supervised by a ‘teacher’. • The network is exposed to samples and presents its output for each input. • The teacher presents the desired outputs. • The target of learning is to minimize the difference between the network output and the desired output. • Usually, we define an error function and search for the set of connections that give minimal error.

  3. Error Function • A popular approach is to choose a quadratic error function:Error = (desired output – network output)2 • Any deviation results in a positive error. The error is zero only when there is no deviation.

  4. Linear Network We are given P labeled samples: The number of components in each sample is denoted by N. For simplicity, we first consider 2 dimensional inputs:

  5. Linear Network The quadratic error: We are interested in the weights that give minimal error. For a linear network we can solve directly (no need for a learning process).

  6. Linear Network Graphically, the quadratic error has the following form:

  7. Linear Network Input correlations Correlation between input and output

  8. Linear Network • By solving the equations we obtain the optimal weights. • How will the network perform with new examples? What will be its ability to generalize? • We expect the generalization ability to grow with the number of samples in the training set. • When there are not enough samples (# of samples < # of parameters) there are more variables than equations and there are infinitely many solutions.

  9. Linear Network For N=2 and P=1, the error has the following form:

  10. Generalization vs. Training Error • We define the following types of errors: • Training error:The mean error on the training set. • Generalization error:The mean error over all possible samples.

  11. # of training samples to # of learned parameters ratio Generalization vs. Training Error Typically, the graphs have the following form: Generalization error for the optimal set from the training phase Training error

  12. The learned parameter How to construct a learning algorithm to reduce the error? • Suppose the error as a function of the learned parameter has the following form: The idea: we update the parameter in the direction opposite to the sign of the derivative – gradient descent.

  13. Types of Learning from Samples • Batch LearningThe weights are changed only after seeing a batch of samples. The samples can be recycled. • On-line LearningA learning step is performed after seeing each sample. Samples cannot be recycled. This type is more similar to natural learning.

  14. Online Learning in the Linear Network The gradient descent learning rule:

  15. Online Learning in the Linear Network • The parameter η (eta) represents the learning rate (LR). • When the LR is large, the learning process is fast but inaccurate. • When the LR is small, the learning process is typically more accurate but slow.

  16. When the LR is small the learning is smooth Error Learned parameter

  17. Large LR makes the learning noisy and can cause jumps from one minimum to another Error Learned parameter

  18. Online Learning in the Linear Network Example: MATLAB Demo

  19. Effect of LR in a Simple System • We next analyze a simple example for the effect of the LR on the convergence of the learning process. • The input samples are numbers drawn from a Gaussian random number generator. • For instance:9.5674, 8.3344, 10.1253, 10.2877, 8.8535, 11.1909, 11.1892, 9.9624, 10.3273, 10.1746 …

  20. Effect of LR in a Simple System • Target: predict the next number. • Input: x. • Learned parameter: W.

  21. Effect of LR in a Simple System • The quadratic error: • The gradient descent learning rule:

  22. Effect of LR in a Simple System MATLAB Demo

  23. Effect of LR in a Simple System • How can the results be explained? • We shall perform “convergence in the mean” analysis. • (Board work)

  24. Effect of LR in a Simple System Conclusions from the analysis: • At critical values of the LR there are qualitative transitions in the nature of the learning process. • When the LR is too large, the learning will not converge although each step is performed in the right direction.

  25. Changing the LR with Time • In order to tradeoff speed and accuracy we can reduce the LR with time. • Initially, when the LR is large, the learning process find the deep regions of the error landscape. • The decrease in the LR allows the network to eventually settle in one of the local minima. • If the LR goes down too fast, the learning will not be able to sample the relevant parameter space. To prevent this, the LR is usually taken to be proportional to 1 over the time step (recall the Harmonic series):

  26. Life as Online Learning • During our life the plasticity of our brains changes. • Typically, we are more plastic as kids and over the years our plasticity goes down.

  27. How to deal with more complex problems? • In most real-world problems, the functions to be learned involve non-linearities, and thus non-linear neurons have to be used. • In addition, many interesting problems require multilayered networks.

  28. Non-linear Multilayer Network

  29. The transfer function of each neuron is usually sigmoidal

  30. The Back-Propagation Algorithm (Board work)

  31. Examples of Applications • Robotics (image processing, walking, navigation) • Prediction of economical processes • Character recognition • Medicine (diagnosis, prediction of outcome) • …

  32. Digit Recognition http://yann.lecun.com/exdb/lenet/index.html

  33. Digit Recognition (when the curve is parameterized) Inbal Pinto, Israeli Arts and Science Academy

More Related