Neural Networks

Neural Networks CSE 4309 – Machine Learning VassilisAthitsos Computer Science and Engineering Department University of Texas at Arlington

Perceptrons • A perceptron is a function that maps D-dimensional vectors to real numbers. • For notational convenience, we add a zero-th dimension to every input vector, that is always equal to 1. • is called the bias input. It is always equal to 1. • is called the bias weight. It is optimized during training. Output:

Perceptrons • A perceptron computes its output in two steps: First step: Second step: • In a single formula: Output:

Perceptrons • A perceptron computes its output in two steps: First step: Second step: • is called an activation function. • For example, could be the sigmoidal function Output:

Perceptrons • We have seen perceptrons before, we just did not call them perceptrons. • For example, logistic regression produces a classifier function . • If we set and , then is a perceptron. Output:

Perceptronsand Neurons • Perceptrons are inspired by neurons. • Neurons are the cells forming the nervous system, and the brain. • Neurons somehow sum up their inputs, and if the sum exceeds a threshold, they "fire". • Since brains are "intelligent", computer scientists have been hoping that perceptron-based systems can be used to model intelligence. Output:

Activation Functions • A perceptron produces output . • One choice for the activation function : the step function. • The step function is useful for providing some intuitive examples. • It is not useful for actual real-world systems. • Reason: it is not differentiable, it does not allow optimization via gradient descent.

Activation Functions • A perceptron produces output . • Another choice for the activation function : the sigmoidal function. • The sigmoidal is often used in real-world systems. • It is a differentiable function, it allows use of gradient descent.

Example: The AND Perceptron • Suppose we use the step function for activation. • Suppose boolean value false is represented as number 0. • Suppose boolean value true is represented as number 1. • Then, the perceptron below computes the boolean AND function: Output: false AND false = false false AND true = false true AND false = false true AND true = true

Example: The AND Perceptron • Verification: If and : • . • Corresponds to case false AND false = false. Output: false AND false = false false AND true = false true AND false = false true AND true = true

Example: The AND Perceptron • Verification: If and : • . • Corresponds to case false AND true = false. Output: false AND false = false false AND true = false true AND false = false true AND true = true

Example: The AND Perceptron • Verification: If and : • . • Corresponds to case true AND false = false. Output: false AND false = false false AND true = false true AND false = false true AND true = true

Example: The AND Perceptron • Verification: If and : • . • Corresponds to case true AND true = true. Output: false AND false = false false AND true = false true AND false = false true AND true = true

Example: The OR Perceptron • Suppose we use the step function for activation. • Suppose boolean value false is represented as number 0. • Suppose boolean value true is represented as number 1. • Then, the perceptron below computes the boolean OR function: Output: false OR false = false false OR true = true true OR false = true true OR true = true

Example: The OR Perceptron • Verification: If and : • . • Corresponds to case false OR false = false. Output: false OR false = false false OR true = true true OR false = true true OR true = true

Example: The OR Perceptron • Verification: If and : • . • Corresponds to case false OR true = true. Output: false OR false = false false OR true = true true OR false = true true OR true = true

Example: The OR Perceptron • Verification: If and : • . • Corresponds to case true OR false = true. Output: false OR false = false false OR true = true true OR false = true true OR true = true

Example: The OR Perceptron • Verification: If and : • . • Corresponds to case true OR true = true. Output: false OR false = false false OR true = true true OR false = true true OR true = true

Example: The NOT Perceptron • Suppose we use the step function for activation. • Suppose boolean value false is represented as number 0. • Suppose boolean value true is represented as number 1. • Then, the perceptron below computes the boolean NOT function: Output: NOT(false) = true NOT(true) = false

Example: The NOT Perceptron • Verification: If : • . • Corresponds to case NOT(false) = true. Output: NOT(false) = true NOT(true) = false

Example: The NOT Perceptron • Verification: If : • . • Corresponds to case NOT(true) = false. Output: NOT(false) = true NOT(true) = false

The XOR Function • As before, we representfalse with 0 and true with 1. • The figure shows the four input points of the XOR function. • green corresponds to output value true. • red corresponds to output value false. • The two classes (true and false) are not linearly separable. • Therefore, no perceptron can compute the XOR function. false XOR false = false false XOR true = true true XOR false = true true XOR true = false

Our First Neural Network: XOR • A neural network is built using perceptrons as building blocks. • The inputs to some perceptrons are outputs of other perceptrons. • Here is an example neural network computing the XOR function. Unit 4 Unit 3 Unit 5 Output:

Our First Neural Network: XOR • To simplify the picture, we do not show the bias input anymore. • We just show the bias weights . • Besides the bias input, there are two inputs: , . Unit 4 Unit 3 Unit 5 Output:

Our First Neural Network: XOR • The XOR network shows how individual perceptrons can be combined to perform more complicated functions. AND unit OR unit A AND (NOT B) Output:

Computing the Output: An Example • Suppose that (corresponding to false XOR true). • For the OR unit: • The dot product is: . • The activation function (assuming a step function) outputs 1. AND unit OR unit A AND (NOT B) *1 Output:

Computing the Output: An Example • Suppose that (corresponding to false XOR true). • For the AND unit: • The dot product is: . • The activation function (assuming a step function) outputs 0. AND unit OR unit A AND (NOT B) *1 Output:

Computing the Output: An Example • Suppose that 1 (corresponding to false XOR true). • For the output unit(computing the A AND (NOT B) function): • One input is the output of the OR unit, which is 1. • The other input is the output of the AND unit, which equals 0. AND unit OR unit A AND (NOT B) *1 Output:

Computing the Output: An Example • Suppose that 1 (corresponding to false XOR true). • For the output unit(computing the A AND (NOT B) function): • The dot product is: . • The activation function (assuming a step function) outputs 1. AND unit OR unit A AND (NOT B) *1 Output:

Verifying the XOR Network • We can follow the same process to compute the output of this network for the other three cases. • Here we consider the case where (corresponding to false XOR false). • The output is 0, as it should be. AND unit OR unit A AND (NOT B) *1 Output:

Verifying the XOR Network • We can follow the same process to compute the output of this network for the other three cases. • Here we consider the case where (corresponding to true XOR false). • The output is 1, as it should be. AND unit OR unit A AND (NOT B) *1 Output:

Verifying the XOR Network • We can follow the same process to compute the output of this network for the other three cases. • Here we consider the case where (corresponding to true XOR true). • The output is 0, as it should be. AND unit OR unit A AND (NOT B) *1 Output:

Neural Networks • This neural network example consists of six units: • Three input units (including the not-shown bias input). • Three perceptrons. • Yes, in the notation we will be using, inputs count as units. Unit 4 Unit 3 Unit 5 Output:

Neural Networks • Weights are denoted as . • Weight belongs to the edge that connects the output of unit with an input of unit . • Units are the input units(units 0, 1, 2 in this example). Unit 4 Unit 3 Unit 5 Output:

Neural Network Layers • Oftentimes, neural networks are organized into layers. • The input layer is the initial layer of input units (units 0, 1, 2 in our example). • The output layer is at the end (unit 5 in our example). • Zero, one or more hidden layers can be between the input and output layers. Unit 4 Unit 3 Unit 5 Output:

Neural Network Layers • There is only one hidden layer in our example, containing units 4 and 5. • Each hidden layer's inputs are outputs from the previous layer. • Each hidden layer's outputs are inputs to the next layer. • The first hidden layer's inputs come from the input layer. • The last hidden layer's outputs are inputs to the output layer. Unit 4 Unit 3 Unit 5 Output:

Feedforward Networks • Feedforward networks are networks where there are no directed loops. • If there are no loops, the output of a neuron cannot (directly or indirectly) influence its input. • While there are varieties of neural networks that are not feedforward or layered, our main focus will be layered feedforward networks. Unit 4 Unit 3 Unit 5 Output:

Computing the Output • Notation: L is the number of layers. • Layer 1 is the input layer, layer L is the output layer. • Given values for the input units, output is computed as follows: • For : • Compute the outputs of layer L, given the outputs of layer L-1. Unit 4 Unit 3 Unit 5 Output:

Computing the Output • To compute the outputs of layer (where ), we simply need to compute the output of each perceptron belonging to layer . • For each such perceptron, its inputs are coming from outputs of perceptrons at layer . • Remember, we compute layer outputs in increasing order of . Unit 4 Unit 3 Unit 5 Output:

What Neural Networks Can Compute • An individual perceptron is a linear classifier. • The weights of the perceptron define a linear boundary between two classes. • Layered feedforward neural networks with one hidden layer can compute any continuous function. • Layered feedforward neural networks with two hidden layers can compute any mathematical function. • This has been known for decades, and is one reason scientists have been optimistic about the potential of neural networks to model intelligent systems. • Another reason is the analogy between neural networks and biological brains, which have been a standard of intelligence we are still trying to achieve. • There is only one catch: How do we find the right weights?

Training a Neural Network • In linear regression, for the sum-of-squares error, we could find the best weights using a closed-form formula. • In logistic regression, for the cross-entropy error, we could find the best weights using an iterative method. • In neural networks, we cannot find the best weights (unless we have an astronomical amount of luck). • We only have optimization methods that find local minima of the error function. • Still, in recent years such methods have produced spectacular results in real-world applications.

Notation for Training Set • We define to be the vector of all weights in the neural network. • We have a set of N training examples. • Each is a (D+1)-dimensional column vector. • Dimension 0 is the bias input, always set to 1. • We also have a set of N target outputs. • is the target output for training example . • Each is a K-dimensional column vector: • Note: K typically is not equal to D.

Perceptron Learning • Before we discuss how to train an entire neural network, we start with a single perceptron. • Remember: given input , a perceptron computes its output using this formula: • We use sum-of-squares as our error function. • is the contribution of training example : • The overall error is defined as: • Important: a single perceptron has a single output. • Therefore, for perceptrons (but NOT for neural networks in general), we assume that is one-dimensional.

Perceptron Learning • Suppose that a perceptron is using the step function as its activation function . • Can we apply gradient descent in that case? • No, because is not differentiable. • Small changes of usually lead to no changes in • The only exception is when the change in causes to switch signs (from positive to negative, or from negative to positive).

Perceptron Learning • A better option is setting to the sigmoid function: • Then, measured just on a single training object , the error is defined as: • Note: here we use the sum-of-squares error, and not the cross-entropy error that we used for logistic regression. • Also note: if our neural network is a single perceptron, then the target output is one-dimensional.

Computing the Gradient • In this form, is differentiable. • If we do the calculations, the gradient turns out to be: • Note that is a (D+1) dimensional vector. It is a scalar (shown in red) multiplied by vector .

Weight Update • So, we update the weight vector as follows: • As before, is the learning rate parameter. • It is a positive real number that should be chosen carefully, so as not to be too big or too small. • In terms of individual weights , the update rule is:

Perceptron Learning - Summary • Input: Training inputs , target outputs • Extend each to a (D+1) dimensional vector, by adding 1 (the bias input) as the value for dimension 0. • Initialize weights to small random numbers. • For example, set each between -0.1 and 0.1 • For n = 1 to N: • Compute . • For d = 0 to D: • If some stopping criterion has been met, exit. • Else, go to step 3.

Stopping Criterion • At step 4 of the perceptron learning algorithm, we need to decide whether to stop or not. • One thing we can do is: • Compute the cumulative squared error E(w) of the perceptron at that point: • Compare the current value of with the value of computed at the previous iteration. • If the difference is too small (e.g., smaller than 0.00001) we stop.

Using Perceptrons for Multiclass Problems • “Multiclass” means that we have more than two classes. • A perceptron outputs a number between 0 and 1. • This is sufficient only for binary classification problems. • For more than two classes, there are many different options. • We will follow a general approach called one-versus-all classification.

Neural Networks

Neural Networks

Presentation Transcript

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural networks

NEURAL NETWORKS

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks