1 / 32

Artificial Neural Networks

Artificial Neural Networks. Overview. Computational units and architectures Learning in perceptrons Learning in Multilayer feed-forward nets. Neural Nets. Composed of basic units and weighted links between them The basic units (or nodes) are an idealization of neurons

halona
Download Presentation

Artificial Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Neural Networks

  2. Overview • Computational units and architectures • Learning in perceptrons • Learning in Multilayer feed-forward nets

  3. Neural Nets • Composed of basic units and weighted links between them • The basic units (or nodes) are an idealization of neurons • Responsible for basic computations • The pattern of connections of the units determines the network architecture

  4. Computation at Units • Compute a 0-1 or a graded function of the weighted sum of the inputs • is the activation function

  5. Common Activation Functions • Step function: g(x)=1, if x >= t ( t is a threshold) g(x) = 0, if x < t • Sign function: g(x)=1, if x >= t ( t is a threshold) g(x) = -1, if x < t • Sigmoid function: g(x)= 1/(1+exp(-x))

  6. Can Implement Boolean Functions • A unit can implement And, Or, and Not • Need mapping True and False to numbers: • e.g. True = 1.0, False= 0.0 • (Exercise) Use a step function and show how to implement various simple Boolean functions • Combining the units, we can get any Boolean function of n variables Can obtain logical circuits as special case

  7. Network Structures • Recurrent (cycles exist), more powerful as they can implement state, but harder to analyze. Examples: • Hopfield network, symmetric connections, interesting properties, useful for implementing associative memory • Boltzmann machines: more general, with applications in constraint satisfaction and combinatorial optimization

  8. Network Structures • Feedforward (no cycles), less power, easier understood • Input units • Hidden layers • Output units • Perceptron: No hidden layer, so basically correspond to one unit, also basically linear threshold functions (ltf) • Ltf: defined by weights and threshold , value is 1 iff otherwise, 0

  9. Perceptron Capabilities • Quite expressive: many, but not all Boolean functions can be expressed. Examples: • conjuncts and disjunctions, example • more generally, can represent functions that are true if and only if at least k of the inputs are true: • Can’t represent XOR

  10. Representable Functions • Perceptrons have a monotinicity property: If a link has positive weight, activation can only increase as the corresponding input value increases (irrespective of other input values) • Can’t represent functions where input interactions can cancel one another’s effect (e.g. XOR)

  11. Representable Functions • Can represent only linearly separable functions • Geometrically: only if there is a line (plane) separating the positives from the negatives • The good news: such functions are PAC learnable and learning algorithms exist

  12. Linearly Separable - + + + _ + + + + + + + + +

  13. NOT linearly Separable + + + _ + + OR + + +

  14. The Perceptron Learning Algorithm • Example of current-best-hypothesis (CBH) search (so incremental, etc.): • Begin with a hypothesis (a perceptron) • Repeat over all examples several times • Adjust weights as examples are seen • Until all examples correctly classified or a stopping criterion reached

  15. Method for Adjusting Weights • One weight update possibility: • If classification correct, don’t change • Otherwise: • If false negative, add input: • If false positive, subtract input: • Intuition: For instance, if example is positive, strengthen/increase the weights corresponding to the positive attributes of the example

  16. Properties of the Algorithm • In general, also apply a learning rate (see book): • The adjustment is in the direction of minimizing error on the example • If learning rate is appropriate and the examples are linear separable, after a finite number of iterations, the algorithm converges to a linear separator

  17. Another Algorithm(least-sum-squares algorithm) • Define and minimize an error function • S is the set of examples, is the ideal function, is the linear function corresponding to the current perceptron • Error of the perceptron (over all examples): • Note:

  18. Derivative of Error • Gradient (derivative) of E: • Take the steepest descent direction: • is the gradient along , is the learning rate

  19. Gradient Descent • The algorithm: pick initial random hype (perceptron) and repeatedly compute error and modify the perceptron (take a step along the reverse of gradient) E Gradient direction: Descent direction:

  20. Gradient Calculation

  21. Derivation (cont.)

  22. Properties of the algorithm • Error function has no local minima (is quadratic) • The algorithm is a gradient descent method to the global minimum, and will asymptotically converge • Even if not linearly separable, can find a good (minimum error) linear classifier • Incremental?

  23. A Third Method • Formulate problem in terms of a linear feasibility or linearoptimization problem • Example: find weights such that • Can be solved in polynomial time (output none if no solution exists, or otherwise output a solution)

  24. Multilayer Feed-Forward Networks • Multiple perceptrons, layered • Example: a two-layer network with 3 inputs one output, one hidden layer (two hidden units) output layer inputs layer hidden layer

  25. Power/Expressiveness • Can represent interactions among inputs (unlike perceptrons) • Two layer networks can represent any Boolean function, and continuous functions (within a tolerance) as long as the number of hidden units is sufficient and appropriate activation functions used • Learning algorithms exist, but weaker guarantees than perceptron learning algorithms

  26. Back-Propagation • Similar to the perceptron learning algorithm and gradient descent for perceptrons • Problem to overcome: How to adjust internal links (how to distribute the “blame” or the error) • Assumption: internal units use differentiable functions and nonlinear • sigmoid functions are convenient

  27. Back-Propagation (cont.) • Start with a hype (network with random weights) • Repeat until a stopping criterion is met • For each example, compute the network output and for each unit i it’s error term • Update each weight (weight of link going from node i to node j): Output of unit i

  28. The Error Term

  29. Derivation • Write the error for a single training example; as before use sum of squared error (as it’s convenient for differentiation, etc): • Differentiate (with respect to each weight…) • For example, we get for weight connecting node j to output i

  30. Properties • Converges to a minimum, but could be a local minimum • Could be slow to converge (Note: Training a three node net is NP-Complete!) • Must watch for over-fitting just as in decision trees (use validation sets, etc.) • Network structure? Often two layers suffices, start with relatively few hidden units

  31. Properties (cont.) • Many variations to the basic back-propagation: e.g. use momentum • Reduce with time (applies to perceptrons as well) Nth update amount a constant

  32. NN properties • Can handle domains with • continuous and discrete attributes • Many attributes • noisy data • Could be slow at training but fast at evaluation time • Human understanding of what the network does could be limited

More Related