Artificial Neural Networks

Artificial Neural Networks • Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree Neural Network Training Data Test Data Classification

Human neuron • Dendrites pick up signals from other neurons • When signals from dendrites reach a threshold, a signal is sent down axon to synapse

Connection with AI • Most modern AI: • “Systems that act rationally” • Implementing neurons in a computer • “Systems that think like humans” • Why artificial neural networks then? • “Universal” function fitter • Potential for massive parallelism • Some amount of fault-tolerance • Trainable by inductive learning, like other supervised learning techniques

Perceptron Example 1 = malignant 0 = benign # of tumors w1 = -0.1 Output Unit w2 = 0.9 Avg area Avg density w3 = 0.1 Input Units

The Perceptron: Input Units • Input units: features in original problem • If numeric, often scaled between –1 and 1 • If discrete, often create one input node for each category • Can also assign values for a single node (imposes ordering)

The Perceptron: Weights • Weights: Represent importance of each input unit • Combined with input units to feed output units • The output unit receives as input:

The Perceptron: Output Unit • The output unit uses an activation function to decide what the correct output is • Sample activation function:

Simplifying the threshold • Managing the threshold is cumbersome • Incorporate as a “virtual” weight

How to learn the right weights? • Need to redefine perceptron • “Step function” no good – need something differentiable • Replace with sigmoid approximation

Sigmoid function • Good approximation to step function • As binfinity,sigmoid  step • We’ll just take b = 1 for simplicity

Computing weights • Think of as a gradient descent method, where weights are variables and trying to minimize error:

The Perceptron Learning Rule: How do we compute weights?

Can appropriate weights always be found? • ONLY IF data is linearly separable

What if data is not linearly separable? Neural Network. O • Each hidden unit is a perceptron • The output unit is another perceptron with hidden units as input Vj

Backpropagation: How do we compute weights?

Neural Networks and machine learning issues • Neural networks can represent any training set, if enough hidden units are used • How long do they take to train? • How much memory? • Does backprop find the best set of weights? • How to deal with overfitting? • How to interpret results?

Artificial Neural Networks