1 / 60

Supervised Learning Artificial Neural Networks Support Vector Machines

CCEB. Supervised Learning Artificial Neural Networks Support Vector Machines. John H. Holmes, Ph.D. Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine. What’s on the agenda for today. Review of classification Introduction to machine learning

stanislaus
Download Presentation

Supervised Learning Artificial Neural Networks Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CCEB Supervised LearningArtificial Neural NetworksSupport Vector Machines John H. Holmes, Ph.D. Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine

  2. What’s on the agenda for today • Review of classification • Introduction to machine learning • Artificial neural networks • Support vector machines

  3. The Classification Problem A+ A- Separating Surface: Find surface to best separate two classes.

  4. To do classification or prediction, you need to have the right data • Pool of data • Training data • Testing data • A class attribute • Categories must be mutually exclusive • Predictor attributes

  5. What is a class? • Defines or partitions a relation • May be dichotomous or polytomous • Is not continuous • Examples • Clinical status (Ill/Well, Dead/Alive) • Biological classification (varieties of genus, species, or order)

  6. Mining class comparisons • Goal: to discover descriptions in the data that distinguish one class from another • These descriptions are concepts! • Data in the classes must be comparable • Same attributes • Same value-system for each attribute • Same dimensions

  7. Some mechanistic details… • Training • Phase during which a system is trained • Focus on generalization • Testing • Phase during which a system is tested on novel cases

  8. A split-sample method Dataset Randomly select cases for the training set The balance is the testing set Training Set Testing Set

  9. Cross-validation Dataset Balance of dataset used for training set Candidate fold for use in testing Fold N-n Fold n Repeat for N-folds (usually 10)

  10. Generic Machine Learning Model

  11. Instance-based learning • Testing cases (unknown class) are compared to training cases, one at a time • The training case closest to the testing case is used to output a predicted class for the testing case • You need a distance function • Euclidean most common • Compares differences between sums of squares for each attribute-value pair • Manhattan distance • Compares differences between each attribute-value pair, without squaring

  12. Kernel-based learning • Kernel • Similarity function that maps a non-linear problem to a linear classifier • The idea is that non-linear data can be classified by a linear classifiers • Linear classifiers are based on the dot product between two vectors • If you substitute the dot product with a kernel function, a linear classifier can be transformed to a non-linear classifier!

  13. Neural Networks • Set of connected input/output units • Three layers: Input, Hidden, Output • Connections have weights that indicate the strength of the link between their units (neurons) • Neural networks learn by adjusting the connection weights as a result of exposure to training cases Methods: Backpropagation, self-organization

  14. A Simple Neural Network Hidden layer Inputs Output x1 (Bitten) x2 (Rabies present) Treat Yes/No x3 (Animal captured) x4 (Animal vaccinated)

  15. Characteristics of neural networks • Neurons are all-or-none devices • Firing depends on reaching some threshold • Networks rely on connections made between axons and dendrites • Synapses • Neurotransmitters • “Wiring”

  16. A biologic neuron

  17. A simulated neuron

  18. Neuronal structure and function • Input • Always 0 or 1, until multiplied by a: • Weight • Determines a neuron’s effect on another in a connection • Inputs multiplied by their weights are processed by an: • Adder • Sums the weighted inputs from all connected neurons for processing through a: • Threshold function • Determines the output of the neuron based on the summed, weighted inputs

  19. So... • Neural nets are arithmetic constraint networks • Operation frames denote arithmetic constraints • Demon procedures propagate stimuli through the net

  20. Perceptrons: The simplest type of neural net

  21. Perceptrons, contd. • Threshold logic unit or ADALINE • One neuron • Only binary (0/1) inputs are allowed • Logic boxes intercede between inputs and weights to interpret the environment • The neuron sums the weighted inputs and reports an output based on the threshold • The main task is to learn the weights

  22. How an output is produced in a perceptron

  23. Thus... If I1=2, I2=1, w1=.5, w2=.3, and -1, the output 0 of the perceptron is 1 because: 0=(2)(.5)+(1)(.3)-1=.3 (>0)

  24. Weight and threshold adjustment in perceptrons • Adjustments are made only when an error occurs in the output • Weight adjustment wi(t+1)=wi(t)+wi(t) where wi(t)=(D-0)Ii

  25. Weight and threshold adjustment in perceptrons, contd. • Threshold adjustment i(t+1)=i(t)+i(t) where i(t)=(D-0)

  26. Types of error

  27. How the adjustment works... • If output O is correct, no change is made • If a false positive error • Each weight is adjusted by subtracting the corresponding value in the input pattern • Threshold is adjusted by subtracting 1 • If a false negative error • Each weight is adjusted by adding the corresponding value in the input pattern • Threshold is adjusted by adding 1

  28. Thus... • If a false positive error was made on the example (I1=2, I2=1, w1=.5, w2=.3, and -1), the weights would have been adjusted as: w1(t+1)=.5+(0-1)(2)=-1.5 w2(t+1)=.3+(0-1)(1)=-.7 (t+1)=-1+(0-1)=-2

  29. Training a perceptron: The pseudocode Do while output incorrect For each training case x If ox incorrect (dx- ox0) If dx - ox =1 Add logic box output vector to weight vector Else Subtract logic box output vector from weight vector x=x+1 EndDo

  30. Two-layer, multiple-output network

  31. Multiple-layer, single-output network

  32. Multiple-layer, multiple-output network

  33. Backpropagation • Most common implementation of neural nets • Two-stage process • feed-forward activation from input to output layer • propagation of errors in the output backward to the input layer • Change w in proportion to the effect on the error observed at the outputs • Error=d-o • Where d=known class value, o=output from ANN

  34. Backpropagation requires hidden layers • Middle layers build internal model of the way input patterns are related to the desired outputs • The knowledge representation is implicit in this model- it is the synapses (connectivity) that is the representation

  35. Hidden layers • As the number of hidden layers increases, the training error rate decreases • Due to increased flexibility in the network to fit the data

  36. Calculating the output of a hidden unit • Logistic function • Output will be any real number between 0 and 1

  37. Weight and threshold adjustment in backpropagation • Training involves adjustment of the weight that is proportional to the product of a learning rate (lrate), an error derivative (errdrv), and the the input I • Because there are multiple layers, the input to unit j may be the output of a unit in the previous hidden layer, Oi

  38. Weight adjustment in backpropagation wij(t+1)=wij(t)+wij(t) where wij(t)=(lrate)(errdrv)jOi

  39. Threshold adjustment in backpropagation j(t+1)=j(t)+j(t) where j(t)=(lrate)(errdrv)j

  40. Calculating the error derivatives • Output units in the output layer:(errdrv)j=Oj(1-Oj)(Dj-Oj) • Output units in a hidden layer • Sum the error derivatives of all k units connected to unit j in the next higher layer

  41. How backpropagation works:The pseudocode Initialize weights For each training case i Present input i Generate output oi Calculate error (di - oi ) Do while output incorrect For each layer j Pass error back to each neuron n in layer j Modify weight in each neuron n in layer j EndDo i=i+1 or more typically, i=Rnd(i)

  42. Problems with backpropagation • Gradient descent (ascent) is hill-climbing • Add a momentum term to the generalized delta rule • Allows sliding over local minima/maxima • Scaling to the problem domain • Increasing the number of hidden layers can cause degradation in performance

  43. Problems with backpropagation, contd. • Biological implausibility • Some believe that reverse neural pathways do not exist simultaneously with feed-forward pathways • Relies more on distant neurons for information than on local neurons

  44. Neural Networks: Summary • Advantages • Excellent performance on many databases • Good choice for predictive mining • Resistant to noisy data • Disadvantages • Require a priori knowledge model • Require substantial parameterization • Training can require long periods • Knowledge not easily represented

  45. Let’s look at an example

  46. Support Vector Machines • Blend linear models with instance-based learning • Basic principle • Select a number of critical boundary instances (support vectors) from each class • Build a linear discriminant function that separates the SVs as much as possible

  47. But SVMs are more than just linear models! • Other, non-linear terms can be added • Decision boundaries not linearly constrained • Quadratic, cubic, polynomial boundaries now possible! • How? • Use non-linear functions to transform input • Thus, SVMs use linear models to implement non-linear class boundaries

  48. Linear Classifiers How would you classify these data?

  49. Linear Classifiers, contd. How would you classify these data?

  50. Linear Classifiers, contd. How would you classify these data?

More Related