350 likes | 534 Views
Last lecture summary Naïve Bayes Classifier. Bayes Rule. Prior and likelihood must be learnt (i.e. estimated from the data). Likelihood. Prior. Posterior. Normalization Constant. learning prior
E N D
Bayes Rule Prior and likelihood must be learnt (i.e. estimated from the data) Likelihood Prior Posterior Normalization Constant
learning prior • A hundred independently drawn training examples will usually suffice to obtain a reasonable estimate of P(Y). • larning likelihood • The Naïve Bayes Assumption: Assume that all features are independent given the class label Y.
Example – Learning Phase P(Outlook=Sunny|Play=Yes) = 2/9 P(Play=Yes) = 9/14 P(Play=No) = 5/14
Example - Prediction x’=(Outl=Sunny, Temp=Cool, Hum=High, Wind=Strong) Look up tables P(Outl=Sunny|Play=No) = 3/5 P(Temp=Cool|Play=No) = 1/5 P(Hum=High|Play=No) = 4/5 P(Wind=Strong|Play=No) = 3/5 P(Play=No) = 5/14 P(Outl=Sunny|Play=Yes) = 2/9 P(Temp=Cool|Play=Yes) = 3/9 P(Hum=High|Play=Yes) = 3/9 P(Wind=Strong|Play=Yes) = 3/9 P(Play=Yes) = 9/14 P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206 Given the factP(Yes|x’) < P(No|x’), we label x’ to be “No”.
TP, TN, FP, FN Precision, Positive Predictive Value (PPV) TP / (TP + FP) Recall, Sensitivity, True Positive Rate (TPR), Hit rate TP / P = TP/(TP + FN) False Positive Rate (FPR), Fall-out FP / N = FP / (FP + TN) Specificity, True Negative Rate (TNR) TN / (TN + FP) = 1 - FPR Accuracy (TP + TN) / (TP + TN + FP + FN)
Biological motivation • The human brain has been estimated to contain (~1011) brain cells (neurons). • A neuron is an electrically excitable cell that processes and transmits information by electrochemical signaling. • Each neuron is connected with other neurons through the connections called synapses. • A typical neuron possesses a cell body (often called soma), dendrites (many, mm), and an axon (one, 10 cm – 1 m).
Synapse permits a neuron to pass an electrical or chemical signal to another cell. • Synapse can be either excitatory, or inhibitory. • Synapses are of different strength (the stronger the synapse is, the more important it is). • The effects of synapses cumulate inside the neuron. • When the cumulative effect of synapses reaches certain threshold, the neuron gets activated, the signal is sent to the axon, through which the neuron is connected to other neuron(s).
Simplistic view of the function of neuron • Neuron accumulates positive/negative stimuli from other neurons. • Then is processed further – – to produce an output, i.e. neuron sends an output signal to neurons connected to it.
Neural networks for applied science and engineering, Samarasinghe
Threshold neuron Warren McCulloch Walter Pitts 1899 - 1969 1923 - 1969
1st mathematical model of neuron – McCulloch & Pitts binary (threshold) neuron • only binary inputs and output • the weights are pre-set, no learning – inputs – weights – activation (tansfer) function - output
In this exercise, both weights will be fixed • When the target is classified as 0 and when as 1? • Set the threshold. • If threshold, then it is classified as 1. • If threshold, then it is classified as 0. • Which threshold would you use? • e.g.
Threshold is incorporated as a weight of one additional input with input value . • Such input is called bias.
Because the location of the threshold function defines the two categories, its value of 1.3 decides a classification boundary that can be formulated as
Perceptron (1957) Frank Rosenblatt Developed the learning algorithm. Used his neuron (pattern recognizer = perceptron) for classification of letters.
binary classifier, maps its input x (real-valued vector) to – a binary value (0 or 1) • (including bias) • 0 … otherwise • perceptron can adjust its weights (i.e. can learn) – perceptron learning algorithm
Multiple output perceptron • for multicategory (i.e. more than 2 classes) classification • one output neuron for each class output layer input layer single layer (one-layered) vs. double layer (two-layered)
Learning • Set the weights (including threshold ). • Supervised learning, we know the target values . • We want the outputs to be as close as possible to the desired values of . • We define an error (Sum of Squares Error, we already know this one)
“ to be as close as possible to ” means that shoud be minimal • So we want to minimize , which is the function of weights . • is also called objective function or sometimes energy.
requirements for the minimum Gradientgrad is a vector pointing in the direction of the greatest rate of increase of the function We want to decline, we take -grad.
Delta rule • gradient descent • How to train linear neuron using delta rule? • Demonstration will be given for one neuron with one input , no bias, one output .
Neuron is presented with an input pattern. • It calculates , and its outuput as (no threshold is used) • The error E: • If you draw against , which curve you get? gradient error
To find a gradient , differentiate the error E with respect to w1: • According to the delta rule, weight change is proportional to the negative of the error gradient: • New weight:
β is called a learning rate. It determines how far along the gradient it is necessary to move.
This is an iterative algorithm, one pass through training set is not enough. • One pass of the whole training data set is called an epoch. • Adjusting the weights after each input pattern presentation (iteration) is called example-by-example (online) learning. • For some problems this can cause weights to oscillate – adjustment required by one pattern may be canceled by the next pattern. • More popular is the next method.
Batch learning – wait until all input patterns (i.e. epoch) have been processed and then adjust weights in the average sense. • More stable solution. • Obtain the error gradient for each input pattern • Average them at the end of the epoch • Use this average value to adjust the weights using the delta rule