250 likes | 317 Views
Evaluating Hypothesis. Introduction Estimating Accuracy Sampling Theory and Confidence Intervals Differences in Error Comparing Learning Algorithms. Introduction. Given some training data a learning algorithm produces a hypothesis.
E N D
Evaluating Hypothesis • Introduction • Estimating Accuracy • Sampling Theory and Confidence Intervals • Differences in Error • Comparing Learning Algorithms
Introduction Given some training data a learning algorithm produces a hypothesis. The next step is to estimate the accuracy of the hypothesis on testing data: Data Learning Hypothesis Algorithm Performance Assessment • How do we know how precise is our estimation? • There are two difficulties: • Bias in the estimate • Variance in the estimate
Bias and Variance • Bias in the estimate. Normally overoptimisitc. • To avoid it we use a separate set of data. • Variance in the estimate. The estimation varies from • sample to sample. The smaller the sample the larger the variance. Estimated Accuracy Variance accuracy True accuracy Bias sample size
Evaluating Hypothesis • Introduction • Estimating Accuracy • Sampling Theory and Confidence Intervals • Differences in Error • Comparing Learning Algorithms
Estimating Accuracy Examples in the input space are randomly distributed according to some probability distribution D. p(X) Input Space X • Questions: • Given a hypothesis h and a dataset with n examples randomly • obtained based on D, what is the accuracy of h in future examples? • 2. What is the error in this (accuracy) estimate?
Example Classification of mushrooms. Some mushrooms are more likely to show up than others. Example: More likely to appear frequency size
Sample Error and True Error Sample error: errorS (h) = 1/n Σ δ (f(X),h(X)) Where f is the true target function, h is the hypothesis, and δ(a,b) = 1 if a = b, 0 otherwise. True Error: errorD (h) = P[ f(X) = h(x)]D How good is errorS (h) in estimating errorD (h) ?
Example There are 4 mushrooms in our dataset: {X1, X2, X3, X4} out of a space of 6 possible mushrooms. The probability distribution is such that P(X1) = 0.2 P(X4) = 0.1 P(X2) = 0.1 P(X5) = 0.2 P(X3) = 0.3 P(X6) = 0.1 Our hypothesis classifies correctly X1, X2, and X3 but not X4. The sample error is ¼ (0 + 0 + 0 + 1) = ¼ = 0.25 Our hypothesis also classifies correctly X6 but not X5. The true error is 0.2(0) + 0.1(0) + 0.3(0) + 0.1(1) + 0.2(1) + 0.1(1) = 0.3
Evaluating Hypothesis • Introduction • Estimating Accuracy • Sampling Theory and Confidence Intervals • Differences in Error • Comparing Learning Algorithms
Confidence Intervals on Sample Error • Assume the following conditions are present: • The sample has n examples drawn according to probability D. • n > 30 • Hypothesis h has made r errors in the n examples. • Then with probability of 95%, the true error lies in the interval: • errorS(h) +- 1.96 errorS(h) (1 - errorS(h)) / n For example if n = 40 and r =12 then with 95% confidence the interval lies in 0.30 +- 0.14
Sampling Error and the Binomial Distribution How much does the size of the dataset affect the difference between the sample error and the true error? We have a sample of size n obtained according to distribution D. Instances are drawn independently from each other. This probability can be modeled through the binomial distribution
Combinations • Combinations: • Assume we wish to select r objects from n objects. • In this case we do not care about the order in which we select the r objects. • The number of possible combinations of r objects from n objects is n ( n-1) (n-2) … (n –r +1) / r! = n! / (n-r)! r! • We denote this number as C(n,r)
The Mean • Let X be a discrete random variable that takes the following values: x1, x2, x3, …, xn. Let P(x1), P(x2), P(x3),…,P(xn) be their respective probabilities. Then the expected value of X, E(X), is defined as E(X) = x1P(x1) + x2P(x2) + x3P(x3) + … + xnP(xn) E(X) = Σi xi P(xi)
The Variance • Let X be a discrete random variable that takes the following values: x1, x2, x3, …, xn. Let u be the mean. Then the variance is: variance(X) = E[ X – u] 2
Binomial Distribution • What is the probability of getting x successes in n trials? • Assumption: all trials are independent and the probability of success remains the same. Let p be the probability of success and let q = 1-p then the binomial distribution is defined as P(x) = nCx p x q n-x for x = 0,1,2,…,n The mean equals n p The variance equals n p q
The Sampling Error The sampling error can be modeled using a binomial distribution. For example. Let’s suppose we have a dataset of size n = 40 and that the true probability of error is 0.3. Then the expected number of errors is np = 40(0.3) = 12 Plotting the binomial distribution: 0 10 20 30 40
Bias and Variance If r is the number of errors in our dataset of size n, then our estimation of the sample error is r/n. The true error is p. The bias of an estimator Y is E[Y] – p. The sample error is an unbiased estimator of the true error because the expected value of r/n is p. The standard deviation is approx. errorS(h) (1 - errorS(h)) / n
Evaluating Hypothesis • Introduction • Estimating Accuracy • Sampling Theory and Confidence Intervals • Differences in Error • Comparing Learning Algorithms
Differences in Error Suppose we are comparing two hypothesis from two algorithms, say a decision tree and a neural network: h1 h2 Type B Type A
Differences in Error We would like to compute the difference in error d = error(h1) – error(h2). Variable d can be considered approx. normal. (because the difference of two normal distributions is also normal). The variance is the sum of the variances of both errors. So the interval confidence is defined as: d +- Zn error(h1)(1 – error(h1)) / n1 + error(h2)(1 – error(h2)) / n2
Evaluating Hypothesis • Introduction • Estimating Accuracy • Sampling Theory and Confidence Intervals • Differences in Error • Comparing Learning Algorithms
Comparing Learning Algorithms We have two algorithms LA and LB. How do we know which is one is better on average for learning some target function? We want to compute the expected value of their different performance according to distribution D: E[ error(LA) – error(LB) ] D
Comparing Learning Algorithms With a limited sample what we can do is the following: For i = 1 to k Separate the data into a training set and a testing set Train LA and LB on the testing set Compute the difference in error: di = error(LA) – error(LB) End Return dmean = 1/k Σ di
Comparing Learning Algorithms The standard deviation for this statistic is: standard dev = [1 / k(k-1)] Σ (di – dmean)2 So the confidence intervals are: dmean +- t(n,k-1) (standard dev)
Summary and Conclusions • Estimating the accuracy of a hypothesis may have some error. • Errors in estimation are the bias and variance factors. • One can compute confidence intervals using statistical theory. • The sampling error can be modeled using a Binomial distribution. • Differences in error can be computed using multiple subsampling.