200 likes | 226 Views
Evaluating the accuracy of hypotheses is crucial in ML to determine their utility. Explore methods for comparison, estimation, error causes, and evaluators to enhance decision-making in learning systems.
E N D
Evaluating Hypothesis 자연언어처리연구실 장 정 호
개요 • Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral component of many learning system • Difficulty from limited set of data - Bias in the estimate - Variance in the estimate
1. Contents • Methods for evaluating learned hypotheses • Methods for comparing the accuracy of two hypotheses • Methods for comparing the accuracy of two learning algorithms when limited set of data is available
2. Estimating Hypothesis Accuracy • Two Interests 1. Given a hypothesis h and a data sample, what is the best estimate of the accuracy of h over unseen data? 2. What is probable error in accuracy estimate?
2. Evaluating… (Cont’d) • Two Definitions of Error 1. Sample Error with respect to target function f and data sample S, 2. True Error with respect to target function f and distribution D, How good an estimate of errorD(h)is provided by errorS(h)?
2. Evaluating… (Cont’d) • Problems Causing Estimating Error 1. Bias : if S is training set, errorS(h) is optimistically biased estimation bias = E[errorS(h)] - errorD(h) For unbiased estimate, h and S must be chosen independently 2. Variance : Even with unbiased S, errorS(h) may vary from errorD(h)
2. Evaluating… (Cont’d) • Estimators Experiment : 1. Choose sample S of size n according to distribution D 2. Measure errorS(h) errorS(h) is a random variable errorS(h) is an unbiased estimator for errorD(h) Given observed errorS(h) what can we conclude about errorD(h) ?
2. Evaluating… (Cont’d) • Confidence Interval if 1. S contains n examples, drawn independently of h and each other 2. n >= 30 then with approximately N% probability, errorD(h) lies in interval
2. Evaluating… (Cont’d) • Normal Distribution Approximates Binomial Distribution errorS(h) follows a Binomial distribution, with Approximate this by a Normal distribution with
2. Evaluating… (Cont’d) • More Correct Confidence Interval if 1. S contains N examples, drawn independently of h and each other 2. N>= 30 then with approximately 95% probability, errorS(h) lies in interval equivalently, errorS(h) lies in interval which is approximately
2. Evaluating… (Cont’d) • Two-sided and One-sided bounds 1. Two-sided What is the probability that errorD(h) is between L and U? 2. One-sided What is the probability that errorD(h) is at most U? 100(1-a)% confidence interval in Two-sided implies 100(1-a/2)% in One-sided.
3. General Confidence Interval • Consider a set of independent, identically distributed random variables Y1…Yn, all governed by an arbitrary probability distribution with mean and variance 2. Define sample mean, • Central Limit Theorem As n, the distribution governing approaches a Normal distribution, with mean and variance 2 /n.
3. General Confidence Interval (Cont’d) 1. Pick parameter p to estimate errorD(h) 2. Choose an estimator errorS(h) 3. Determine probability distribution that governs estimator errorS(h) governed by Binomial distribution, approximated by Normal distribution when n>=30 4. Find interval (L, U) such that N% of probability mass falls in the interval
4. Difference in Error of Two Hypothesis • Assumption -two hypothesis h1, h2. - h1 is tested on sample S1 containing n1 random examples. h2 is tested on sample S2 containing n2 ramdom examples. • Object - get difference between two true errors. where, d = errorD(h1) - errorD(h2)
4. Difference in Error of Two Hypothesis(Cont’d) • Procedure 1. Choose an estimator for d 2. Determine probability distribution that governs estimator 3. Find interval (L, U) such that N% of probability mass falls in the interval
4. Difference in Error of Two Hypothesis(Cont’d) • Hypothesis Test Ex) size of S1, S2 is 100 error s1(h1)=0.30, errors2(h2) = 0.20 What is the probability that errorD(h1) > errorD(h2)?
4. Difference in Error of Two Hypothesis(Cont’d) • Solution 1. The problem is equivalent to getting the probability of the following 2. From former expression, 3. Table of Normal distribution shows that associated confidence level for two-sided interval is 90%, so for one-sided interval, it is 95%
5. Comparing Two Learning Algorithms • What we’d like to estimate: where L(S) is the hypothesis output by learner L using training set S But, given limited data D0, what is a good estimator? Could partition D0 into training set S and test set T0, and measure errorT0(LA(S0)) - errorT0(LB(S0)) Even better, repeat this many times and average the results
5. Comparing Two Learning Algorithms(Cont’d) 1. Partition data D0 into k disjoint test sets T1, T2, …, Tk of equal size, where this size if at least 30. 2. For 1 <= i <=k, do use Ti for the test set, and the remaining data for training set Si Si= {D0 - Ti}, hA= LA(Si), hB= LB(Si) 3. Return the value i, where
5. Comparing Two Learning Algorithms(Cont’d) 4. Now, use paired t test on to obtain a confidence interval The result is… N% confidence interval estimate for :