Feed-Forward Artificial Neural Networks

Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory Department of Biomedical Informatics Vanderbilt University

Binary Classification Example • Example: • Classification to malignant or benign breast cancer from mammograms • Predictor 1: lump thickness • Predictor 2: single epithelial cell size Value of predictor 2 Value of predictor 1

Possible Decision Areas Class area: Green triangles Class area: red circles Value of predictor 2 Value of predictor 1

Possible Decision Areas Value of predictor 2 Value of predictor 1

Binary Classification Example The simplest non-trivial decision function is the straight line (in general a hyperplane) One decision surface Decision surface partitions space into two subspaces Value of predictor 2 Value of predictor 1

Specifying a Line • Line equation: w2x2+w1x1+w0 = 0 • Classifier: • If w2x2+w1x1+w0 0 • Output • Else • Output w2x2+w1x1+w0 = 0 w2x2+w1x1+w0 < 0 w2x2+w1x1+w0 > 0 x2 x1

w2x2+w1x1+w0 = 0 w2x2+w1x1+w0 < 0 w2x2+w1x1+w0 > 0 x2 x1 Classifying with Linear Surfaces • Use 1 for • Use -1 for • Classifier becomes sgn(w2x2+w1x1+w0)= sgn(w2x2+w1x1+w0x0), with x0=1 always

The Perceptron Output: classification of patient (malignant or benign) W4=3 W2=0 W0=2 W1=4 W2=-2 Input: (attributes of patient to classify) 1 x4 x3 x2 x1 x0

The Perceptron Output: classification of patient (malignant or benign) W4=3 W2=0 W0=2 W1=4 W2=-2 Input: (attributes of patient to classify) 2 3.1 4 0 1 x4 x3 x2 x1 x0

The Perceptron Output: classification of patient (malignant or benign) W4=3 W2=0 W0=2 W1=4 W2=-2 Input: (attributes of patient to classify) 3 3.1 4 0 1 x4 x3 x2 x1 x0

The Perceptron Output: classification of patient (malignant or benign) sgn(3)=1 W4=3 W2=0 W0=2 W1=4 W2=-2 Input: (attributes of patient to classify) 2 3.1 4 0 1 x4 x3 x2 x1 x0

The Perceptron 1 Output: classification of patient (malignant or benign) sgn(3)=1 W4=3 W2=0 W0=2 W1=4 W2=-2 Input: (attributes of patient to classify) 2 3.1 4 0 1 x4 x3 x2 x1 x0

Learning with Perceptrons • Hypotheses Space: • Inductive Bias: Prefer hypotheses that do not misclassify any of the training instances (or minimize an error function). • Search method: perceptron training rule, gradient descent, etc. • Remember: the problem is to find “good” weights

Training Perceptrons True Output: -1 • Start with random weights • Update in an intelligent way to improve them • Intuitively: • Decrease the weights that increase the sum • Increase the weights that decrease the sum • Repeat for all training instances until convergence 1 sgn(3)=1 3 2 0 -2 4 2 3.1 4 0 1 x4 x3 x2 x1 x0

Perceptron Training Rule • t : output the current example should have • o: output of the perceptron • xi: value of predictor variable i • t=o : No change (for correctly classified examples)

Perceptron Training Rule • t=-1, o=1 : sum will decrease • t=1, o=-1 : sum will increase

Perceptron Training Rule • In vector form:

Example of Perceptron Training: The OR function x2 1 1 1 1 -1 x1 0 1

Example of Perceptron Training: The OR function • Initial random weights: Define line: x1=0.5 x2 1 1 1 1 -1 x1 0 1

Example of Perceptron Training: The OR function • Initial random weights: Defines line: x1=0.5 x2 1 1 1 1 -1 x1 0 1 x1=0.5 Area where classifier outputs -1 Area where classifier outputs 1

Example of Perceptron Training: The OR function Only misclassified example x1=0, x2=1 x2 1 1 1 1 -1 x1 0 1 x1=0.5

Example of Perceptron Training: The OR function Only misclassified example x1=0, x2=1 x2 1 1 1 For x2=0, x1=-0.5 For x1=0, x2=-0.5 So, new line is: 1 -1 x1 0 1 x1=0.5

Example of Perceptron Training: The OR function x2 1 1 1 For x2=0, x1=-0.5 For x1=0, x2=-0.5 1 -1 x1 0 1 -0.5 -0.5 x1=0.5

Example of Perceptron Training: The OR function Next iteration: x2 1 1 1 Misclassified example 1 -1 x1 0 1 -0.5 -0.5

Example of Perceptron Training: The OR function x2 Perfect classification No change occurs next 1 1 1 1 -1 x1 0 1 -0.5 -0.5

Analysis of the Perceptron Training Rule • Algorithm will always converge within finite number of iterations if the data are linearly separable. • Otherwise, it may oscillate (no convergence)

Training by Gradient Descent • Similar but: • Always converges • Generalizes to training networks of perceptrons (neural networks) • Idea: • Define an error function • Search for weights that minimize the error, i.e., find weights that zero the error gradient

Setting Up the Gradient Descent • Mean Squared Error: • Minima exist where gradient is zero:

The Sign Function is not Differentiable x2 1 1 1 1 -1 0 1 x1=0.5

Use Differentiable Transfer Functions • Replace • with the sigmoid

Calculating the Gradient

Updating the Weights with Gradient Descent • Each weight update goes through all training instances • Each weight update more expensive but more accurate • Always converges to a local minimum regardless of the data

Multiclass Problems • E.g., 4 classes • Two possible solutions • Use one perceptron (output unit) and interpret the results as follows: • Output 0-0.25, class 1 • Output 0.25 – 0.5, class 2 • Output 0.5 – 0.75, class 3 • Output 0.75 – 1, class 4 • Use four output units and a binary encoding of the output (1-of-m encoding).

1-of-M Encoding • Assign to class with largest output Output for Class 1 Output for Class 2 Output for Class 3 Output for Class 4 x4 x3 x2 x1 x0

Feed-Forward Neural Networks Output Layer Hidden Layer 2 Hidden Layer 1 Input Layer x4 x3 x2 x1 x0

Increased Expressiveness Example: Exclusive OR No line (no set of three weights) can separate the training examples (learn the true function). x2 1 1 -1 1 -1 0 1 w2 w1 w0 x2 x1 x0

Increased Expressiveness Example x2 w’1,1 w’2,1 1 1 -1 w2,2 1 -1 0 1 w1,2 w2,1 w1,1 w0,2 w0,1 x2 x1 x0

Increased Expressiveness Example O x2 1 1 1 -1 1 H1 H2 -1 1 -1 0 1 1 1 -1 0.5 0.5 x2 x1 x0

Increased Expressiveness Example x2 1 -1 O 1 1 1 -1 x1 H1 H2 -1 1 1 -1 0.5 0.5 x2 x1 x0

Increased Expressiveness Example x2 1 -1 O 1 1 1 -1 x1 H1 H2 -1 1 1 -1 -0.5 -0.5 0 0 1 x2 x1 x0

Increased Expressiveness Example x2 1 -1 -1 1 1 1 -1 x1 -1 -1 -1 1 1 -1 -0.5 -0.5 0 0 1 x2 x1 x0

Increased Expressiveness Example x2 1 -1 1 1 1 1 -1 x1 -1 1 -1 1 1 -1 -0.5 -0.5 0 1 1 x2 x1 x0

Increased Expressiveness Example x2 1 -1 1 1 1 1 -1 x1 1 -1 -1 1 1 -1 -0.5 -0.5 1 0 1 x2 x1 x0

Increased Expressiveness Example x2 1 -1 1 1 1 1 -1 x1 -1 -1 -1 1 1 -1 -0.5 -0.5 1 1 1 x2 x1 x0

From the Viewpoint of the Output Layer x2 T4 T3 O 1 1 T1 T2 x1 H1 Mapped By Hidden Layer to: H2 H2 T2 H1 T4 T1 T3

From the Viewpoint of the Output Layer • Each hidden layer maps to a new instance space • Each hidden node is a new constructed feature • Original Problem may become separable (or easier) x2 T4 T3 T1 T2 x1 Mapped By Hidden Layer to: H2 T2 H1 T4 T1 T3

Feed-Forward Artificial Neural Networks

Feed-Forward Artificial Neural Networks

Presentation Transcript

Artificial Neural Networks

Artificial Neural Networks

Multilayer feed-forward artificial neural networks for Class-modeling

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Feed-Forward Neural Networks

Artificial Neural Networks

Artificial neural networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks