1 / 74

Artificial neural networks

Artificial neural networks. Ricardo Ñanculef Alegría Universidad Técnica Federico Santa María Campus Santiago. Learning from Natural Systems. Bio-inspired systems Ants colony Genetic algorithms Artificial neural networks The power of the brain Examples: vision, text-processing

habib
Download Presentation

Artificial neural networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial neural networks Ricardo Ñanculef Alegría Universidad Técnica Federico Santa María Campus Santiago

  2. Learning from Natural Systems • Bio-inspired systems • Ants colony • Genetic algorithms • Artificial neural networks • The power of the brain • Examples: vision, text-processing • Other animals: dolphins, bats

  3. Modeling the Human Brainkey functional characteristics • Learning and generalization ability • Continuous adaptation • Robustesness and fault tolerance

  4. Modeling the Human Brainkey structural characteristics • Massive parallelism • Distributed knowledge representation: memory • Basic organization: networks of neurons receptors Neural nets effectors

  5. Modeling the Human Brainneurons

  6. Human Brain in numbers • Cerebral cortex: E11 neurons (more than the number of stars in the milky-way) • Massive connectivity: E3 to E4 connections per neuron (in total, E15 connections) • Time response E-3 seconds. Silicon chips E-9 seconds (one million times faster ) • Yet, human are more efficient than computers at computationally complex tasks. Why?

  7. Artificial Neural Networks “ A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in: • Knowledge is acquired by a learning process • Connection strengths between processing units are used to store the acquired knowledge. ” Simon Haykin, “Neural Networks, a comprehensive foundation”, 2nd Ed, Reprint 2005, Prentice Hall

  8. Artificial Neural Networks “ From the perspective of pattern recognition, neural networks can be regarded as an extension of the many conventional techniques which have been developed over several decades (…) for example, discriminant functions, logit ..” Christopher Bishop, “Neural Networks for Pattern Recogniton”, Reprint, 2005, Oxford University Press

  9. Artificial Neural Networks diverse applications • Pattern Classification • Clustering • Function Approximation: Regression • Time Series Forecasting • Optimization • Content-addressable Memory

  10. The beginnings • McCulloch and Pitts, 1943: “A logical calculus of the ideas immanent in nervous activity” • First neuron model based on simplifications about the brain behavior • binary incoming signals • Connection strengths: weights to each incoming signal • binary response: active or inactive • activation threshold or bias • Just some years earlier: boolean algebra

  11. The beginnings • The model activation threshold (bias) connection weights

  12. The beginnings • These neurons can compute logical operations

  13. The beginnings • Perceptron (1958). Rosenblatt proposes the use of “layers of neurons” as a computational tool. • Proposes a training algorithm • Emphasis in the learning capabilities of NN • McCulloch and Pitts derived to automata theory

  14. The perceptron • Architecture …

  15. The perceptron • Notation …

  16. Perceptron Rule • We have a set of patterns with desired responses … • If used for classification … number of neurons in the perceptron clase

  17. Perceptron Rule • Initialize the weights and the thresholds • Present a pattern vector • Update the weights according to • If used for classification … learning rate

  18. Separating hyperplanes

  19. Separating hyperplanes • Hyperplane: set L of points satisfying • For any pair of points lying in L • Hence, the normal vector to L is

  20. Separating hyperplanes • Signed distance of any point to L

  21. Separating hyperplanes • Consider a two-class classificaton problem • One class coded as +1 and one as -1 • An input is classified as the sign of the distance to the hyperplane • How to train the classifier?

  22. Separating hyperplanes • An idea: train to minimize the distance of the misclassified inputs to the hyperplane • Note this is very different to train with the quadratic loss of all the points

  23. Gradient Descent • Suppose we have to minimize on • Where is a vector • For example • Iterate:

  24. Stochastic Gradient Descent • Suppose we have to minimize on • Where is a random variable • We have samples of • Iterate:

  25. Separating hyperplanes • If M is fixed • Stochastic gradient descent

  26. Separating hyperplanes • For correctly classified inputs no correction on the parameters is applied • Now, note that

  27. Separating hyperplanes • Perceptron rule

  28. Perceptronconvergence theorem Theorem: If there exists a set of connection weights and activation threshold which is able to separate the two clases, the perceptron algorithm will converge to some solution in a finite number of steps and indepently of the initialization of the weights and bias.

  29. Perceptron • Conclusion: perceptron rule, with two clases, is a stochastic gradient descent algorithm that aims to minimize the distances of the misclassified examples to the hyperplane. • With more than two classes, the perceptron uses one neuron to model a class againts the others. • This is an actual perspective

  30. Delta Rule • Widrow and Hoff • It considers general activation functions

  31. Delta Rule • Update the weights according to …

  32. Delta Rule • Update the weights according to …

  33. Delta Rule • Can the perceptron rule be obtained as a special case from this rule? • Step function is not differentiable • Note that with this algorithm all the patterns are observed before correction, while with the Rosenblatt's algorithm each pattern induces a correction

  34. Perceptrons • Perceptrons and logistic regression • With more than 1 neuron: each neuron has the form of a logistic model of one class against the others.

  35. Neural Networks Death • Minsky: 1969, “Perceptrons”. xn y = 1 b x1 y = -1 ... x1 x2 x3 xn

  36. Neural Networks Death • A perceptron cannot learn the XOR

  37. Neural Networks renaissance • Idea: map the data to a feature space where the solution is linear

  38. Neural Networks renaissance • Problem: this transformation is problem dependent

  39. Neural Networks renaissance • Solution: multilayer perceptrons (FANN) • More biologically plausible • Internal layers learn the map

  40. Architecture

  41. Architecture: regression each output corresponds to a response

  42. Architecture: classification each output corresponds to a class, such that Training data has to be coded by 0-1 response variables

  43. Theorem: Let an admissible activation function and let be a compact subset of Hence, for any continuous function and for any Universal approximation Theorem

  44. Admissible activation functions Universal approximation Theorem

  45. norm extensions: other output activation functions other norms Universal approximation Theorem

  46. Fitting Neural Networks • The back-propagation algorithm: A generalization of the delta rule for multilayer perceptrons • It is a gradient descent algorithm for the quadratic loss function

  47. Back-propagation • Gradient descent generates a sequence of aproximations related as

  48. For Back-propagation Equations Why back propagation? …

  49. Back-propagation Algorithm • Initialize the weights and the thresholds • For each example i compute • Update the weights according to • Iterate 2 and 3 until convergence

  50. Stochastic Back-propagation • Initialize the weights and the thresholds • For each example i compute • Iterate 2 until convergence

More Related