Classification.. continued

Classification.. continued

Prediction and Classification • Last week we discussed the classification problem.. • Used the Naïve Bayes Method • Today..we will dive into more details.. • But first how do we evaluate classifier

Abstract Binary Classification Problem • Given n data samples where xi is a data vector and yi is label {-1,1}. • Aim is to learn a function • Such that f is “accurate” on unseen data. • [ill-specified as defined]

Algorithms to Learn Classifier • We can use an algorithm A to learn the function f: X Y • Then we write f as fA • One example of A is Naïve Bayes. • Other examples {Logistic Regression, Neural Networks, Support Vector Machines, Decision Trees, Random Forests,….}

Training vs. Test Data • In practice to take care of the “unseen” part…we split the data into training and test sets • We learnfA on the training set using an algorithm A • The learned function fA is then evaluated on the test set.

Example • Suppose we learn a function F on training set. • Our test set consists of four data points (z1,1),(z2,-1),(z3,1),(z4,-1). • We apply F on the four data points (without labels) and we get F(z1)=1, F(z2)=1,F(z3)=-1 and F(z4) = -1. • Then F correctly classified z1 and z4 but incorrectly classified z2 and z3.

Confusion Matrix Label 1 is called Positive, Label -1 is called Negative Let the number of test samples be N N = N1 + N2 + N3 + N4. False Positive Rate (FPR) = N2/(N2+N4) True Positive Rate (TPR) = N1/(N1+N3) False Negative Rate (FNR) = N3/(N1+N3) True Negative Rate (TNR) = N4/(N4+N2) Accuracy = (N1+N4)/(N1+N2+N3+N4) Precision = N1/(N1+N2) Recall = N1/(N1+N3)

Example TPR = 5/6; TNR = 20/23; FPR = 3/23; FNR = 2/12; Accuracy = 30/35 Precision = 10/13 and Recall = 10/12

ROC (Receiver Operating Characteristic) Curves • Generally a learning algorithm A will return a real number…but what we want is a label {1 or -1} • We can apply a threshold..T TPR = 3/4 FPR = 2/5 TPR = 2/4 FPR = 2/5

ROC Curve • An ROC Curve is the plot where the x-axis is FPR, the y-axis is the TPR and for each threshold t, the point on the plot represents the pair (FPR(t), TPR(t)) • Lets Look at the Wikipedia ROC Entry

Discussion.. • If F: Symptoms {Disease, No-Disease} • Higher Recall or Precision ? • What is the relative cost of a mis-diagnosis (and which way) • If F: Banner Ad  {Click, No-Click} • Higher Precision means more revenue?

Random Variables • A r.v. is a numerical quantity associated with events in an experiment. • Suppose we roll two dice. Let X = k be the sum of the two faces. • X can take values ranging from {2….12}. • P(X=12) = 1/36. Why ? • Event associated with X=12 is {(6,6)} • P(X=7) = 6/36 = 1/6 • Associated Event: {(1,6),(6,1),(2,5),(5,2),(3,4),(4,3)}

Random Variable • A random variable X can take values in a set which is: • discrete and finite. • Lets toss a coin and X = 1 if it’s a head and X=0 if it’s a tail. X is random variable • discrete and infinite(countable) • Let X be the number of accidents in Sydney in a day.. Then X = 0,1,2,….. • Infinite(uncountable) • Let X be the height of a Sydney-sider. • X = 150, 150.11,150.112,……

Random Variable Properties • Let X be a discrete valued random variable taking values in a set S. The Expected (average) Value of X, E(X) is • The Variance is

Examples • Let X be a random variable which takes values 1 with probability p and 0 with probability 1-p. Then

Examples • Let X be a random variable which denotes the number of “spam emails” in a batch of n emails. Assuming the probability of spam email is p. • X={0,1,2,3,4,5} X is a r.v. which follows a binomial distribution with parameters (n,p)… X ~ Binomial(n,p) • E(X) = np ; Var(X) = np(1-p)

Examples • Let X be a random variable which denotes the number of tcp packets that arrive in a unit time.Then X can be modeled to follow a Poisson distribution.. • E(X) = Var(X) = λ

Continuous Distribution • Ofcourse the most common continuous distribution is the Normal/Gaussian distribution… denoted

How to use r.v. for classification • To use r.v. in classification…we have to make an assumption. • For example..Sepal Length follows a Normal Distribution. • Is this a good/reasonable assumption. • Then we use data to estimate the parameters of the distribution.. • The parameters of a Normal distribution are the mean and the variance (square of standard deviation). • For the moment we can just use Matlab/program to do that… • Once we have the parameters we can use the distribution to estimate the “probability” of Sepal Length taking a new value..

Fitting Distributions..Examples • 0,1,0,1,0,0 • Assume data from a Binomial distribution with 6 trials and 2 successes • In Matlab:>> binofit(2,6) = 0.3333 • 10,20,5,3,3,100 • Assume data is from a Poisson distribution • X=[10 20 5 3 3 100]; • Poissfit(X); • Ans: 23.50 • What is happening ? We are just taking sample averages. The more data we have the more reliable these estimates become.. • Suppose we take Sepal Length…data vector x >> [mean,std] = normfit(x); >> ans: mean = 5.8, std=0.81

Return to the Iris Example • We will redo the Iris Classification Example..but now will use “continuous” values for the attributes…

Classification.. continued

Classification.. continued

Presentation Transcript

Continued……

Part 3b: The Classification Problems continued

Chemical Classification of Hormones (continued)

Classification Algorithms – Continued

Classification II (continued) Model Evaluation

(Continued)

(continued)

Part 3b: The Classification Problems continued

Continued……………..

Continued…

TOBI, continued (continued)

Continued……….

Land(scape) classification (continued) approaches applications

Continued…

(CONTINUED)

Continued..

Continued …

Classification Algorithms – Continued

…………..Continued

Classification Continued Dr Eamonn Keogh