1 / 31

Classification III

Classification III. Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn. Overview. Artificial Neural Networks. Biological Motivation. 10 11 : The number of neurons in the human brain 10 4 : The average number of connections of each neuron

macha
Download Presentation

Classification III

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification III Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

  2. Overview • Artificial Neural Networks

  3. Biological Motivation • 1011: The number of neurons in the human brain • 104: The average number of connections of each neuron • 10-3: The fastest switching times of neurons • 10-10: The switching speeds of computers • 10-1: The time required to visually recognize your mother

  4. Biological Motivation • The power of parallelism • The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons. • The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations. • Sequential machines vs. Parallel machines • Group A • Using ANN to study and model biological learning processes. • Group B • Obtaining highly effective machine learning algorithms, regardless of how closely these algorithms mimic biological processes.

  5. Neural Network Representations

  6. Perceptrons x0=1 x1 w1 w0 x2 w2 ∑ . . . wn xn

  7. Power of Perceptrons -0.3 -0.8 0.5 0.5 0.5 0.5 AND OR

  8. Error Surface Error w1 w2

  9. Gradient Descent Batch Learning Learning Rate

  10. Delta Rule

  11. Batch Learning GRADIENT DESCENT (training_examples, η) • Initialize each wi to some small random value. • Until the termination condition is met, Do • Initialize each Δwi to zero. • For each <x, t> in training_examples, Do • Input the instance x to the unit and compute the output o • For each linear unit weight wi, Do • Δwi← Δwi+ η(t-o)xi • For each linear unit weight wi, Do • wi ← wi + Δwi

  12. Stochastic Learning For example, if xi=0.8, η=0.1, t=1 and o=0 Δwi= η(t-o)xi = 0.1×(1-0) × 0.8 = 0.08 + + + - + + - - - + - -

  13. Stochastic Learning: NAND threshold=0.5 learning rate=0.1

  14. Multilayer Perceptron

  15. XOR + - q - + p Cannot be separated by a single line.

  16. XOR AND + - OR NAND q NAND - + p OR

  17. Hidden Layer Representations Input Hidden Output NAND - + + AND - OR

  18. The Sigmoid Threshold Unit x0=1 x1 w1 w0 x2 w2 ∑ . . . wn xn

  19. Backpropagation Rule j • xji = the ith input to unit j • wji = the weight associated with the ith input to unit j • netj = ∑wjixji (the weighted sum of inputs for unit j ) • oj= the output of unit j • tj= the target output of unit j • σ = the sigmoid function • outputs = the set of units in the final layer • Downstream (j ) = the set of units directly taking the output of unit j as inputs i

  20. Training Rule for Output Units

  21. Training Rule for Hidden Units k j

  22. BP Framework • BACKPROPAGATION (training_examples, η, nin, nout, nhidden) • Create a network with nin inputs,nhidden hidden units and nout output units. • Initialize all network weights to small random numbers. • Until the termination condition is met, Do • For each <x, t> in training_examples, Do • Input the instance x to the network and computer the output o of every unit. • For each output unit k, calculate its error term δk • For each hidden unit h, calculate its error termδh • Update each network weight wji

  23. More about BP Networks … • Convergence and Local Minima • The search space is likely to be highly multimodal. • May easily get stuck at a local solution. • Need multiple trials with different initial weights. • Evolving Neural Networks • Black-box optimization techniques (e.g., Genetic Algorithms) • Usually better accuracy • Can do some advanced training (e.g., structure + parameter). • Xin Yao (1999) “Evolving Artificial Neural Networks”, Proceedings of the IEEE, pp. 1423-1447. • Representational Power • Deep Learning

  24. More about BP Networks … • Overfitting • Tend to occur during later iterations. • Use validation dataset to terminate the training when necessary. • Practical Considerations • Momentum • Adaptive learning rate • Small: slow convergence, easy to get stuck • Large: fast convergence, unstable Validation Error Error Training Weight Time

  25. Beyond BP Networks XOR 0 1 1 0 0 0 1 1 0 1 0 1 … ? 1 ? ? 0 ? ? 0 ? ? 1 ? … Elman Network

  26. Beyond BP Networks Hopfield Network

  27. When does ANN work? • Instances are represented by attribute-value pairs. • Input values can be any real values. • The target output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes. • The training samples may contain errors. • Long training times are acceptable. • Can range from a few seconds to several hours. • Fast evaluation of the learned target function may be required. • The ability to understand the learned function is not important. • Weights are difficult for humans to interpret.

  28. Reading Materials • Text Book • Richard O. Duda et al., Pattern Classification, Chapter 6, John Wiley & Sons Inc. • Tom Mitchell, Machine Learning, Chapter 4, McGraw-Hill. • http://page.mi.fu-berlin.de/rojas/neural/index.html.html • Online Demo • http://neuron.eng.wayne.edu/software.html • http://www.cbu.edu/~pong/ai/hopfield/hopfield.html • Online Tutorial • http://www.autonlab.org/tutorials/neural13.pdf • http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html • Wikipedia & Google

  29. Review • What is the biological motivation of ANN? • When does ANN work? • What is a perceptron? • How to train a perceptron? • What is the limitation of perceptrons? • How does ANN solve non-linearly separable problems? • What is the key idea of Backpropogation algorithm? • What are the main issues of BP networks? • What are the examples of other types of ANN?

  30. Next Week’s Class Talk • Volunteers are required for next week’s class talk. • Topic 1: Applications of ANN • Topic 2: Recurrent Neural Networks • Hints: • Robot Driving • Character Recognition • Face Recognition • Hopfield Network • Length: 20 minutes plus question time

  31. Assignment • Topic: Training Feedforward Neural Networks • Technique: BP Algorithm • Task 1: XOR Problem • 4 input samples • 0 0  0 • 1 0  1 • Task 2: Identity Function • 8 input samples • 10000000  10000000 • 00010000  00010000 • … • Use 3 hidden units • Deliverables: • Report • Code (any programming language, with detailed comments) • Due: Sunday, 24 November • Credit: 15%

More Related