370 likes | 381 Views
This text explains the process of hypothesis testing, using examples such as typhoon days off, salary comparisons, and testing the weight of cherry tomato packages. It covers one-sided and two-sided tests, choosing the appropriate test, and understanding null hypotheses and critical values.
E N D
Hypothesis testing Moore 6, 7 and 8, Guan Chapter 8
Research questions • Can we have a typhoon day off? • No day-off but a winding day • Day-off but a sunny day
Research questions • Do you love a right person? • If your goal is to have a happy marriage!
Research questions • Who has higher initial salary? • The initial salary per month for people with a bachelor degree is NT26577 in Taiwan. The initial salary per month for people with a bachelor degree is about NT70000 in Korea. • The average salary (with PPP adjust) per year is US29600 in Taiwan and is US 24500 in Korea. • How can we do the statistical difference test? • What do those statistics mean? • Although the unemployment rate of Korean is lower than Taiwanese, but the unemployment rate in young generation is much higher than Taiwan. • The surveys for Taiwan (census) and Korea (about top-1000 firms) are different.
Stating hypotheses A test of statistical significance (顯著性檢定) tests a specific hypothesis using sample data to decide on the validity of the hypothesis. In statistics, a hypothesis (假設) is an assumption or a theory about the characteristics of one or more variables in one or more populations. What you want to know: Does the calibrating machine that sorts cherry tomatoes into packs need revision? The same question reframed statistically: Is the population mean µ for the distribution of weights of cherry tomato packages equal to 227 g (i.e., half a pound)?
Stating hypotheses The statement being tested in a test of significance is called the null hypothesis (虛無假設),H0. The test of significance is designed to assess the strength of the evidence against the null hypothesis. It is usually a statement of “no effect” or “no difference.” The alternative hypothesis(對立假設) is the statement we suspect is true instead of the null hypothesis. It is labeled H1 (or Ha.) Weight of cherry tomato packs: H0 : µ = 227 g (µ is the average weight of the population of packs) H1 : µ ≠ 227 g (µ is either larger or smaller)
One-sided and two-sided tests • A two-tail(雙尾) or two-sided test of the population mean has these null and alternative hypotheses: H0 : µ= [a specific number] Ha: µ [a specific number] • A one-tail(單尾) or one-sided test of a population mean has these null and alternative hypotheses: H0 : µ= [a specific number] Ha : µ < [a specific number] OR H0 : µ = [a specific number] Ha : µ > [a specific number] The FDA tests whether a generic(沒有商標的) drug has an absorption extent similar to the known absorption extent of the brand-name drug it is copying. Higher or lower absorption would both be problematic, thus we test: H0 : µgeneric = µbrandHa : µgenericµbrand two-sided
How to choose? What determines the choice of a one-sided versus a two-sided test is what we know about the problem before we perform a test of statistical significance. A company tests whether the mean volume of tea in their bottles is 500 ml, as stated on the label. Here, the company would likely be more concerned that the bottles contain less than advertised, which would likely lead to consumer complaints of false advertising. Thus, this is a one-sided test: H0 : µ = 500 ml Ha : µ < 500 ml It is important to make that choice before performing the test or else you could make a choice of “convenience” or fail in circular logic.
Testing • Testing (檢定):Researcher usestest statistic (檢定統計量) to test whether or not the null hypothesis is true under a specific standard. • If the null hypothesis is NOT true, then we reject null hypothesis (拒絕虛無假設). If we cannot find evidence against the null, then accept (or cannot reject) the null hypothesis. • For example, a sample mean 80 is found. If we want to test whether or not the population mean is 70 (the null hypothesis), then we probably will say that the sample mean (80) is far away from the population mean. Of course, we need test statistic to analyze the testing.
Testing • Once we have set the null hypothesis, we need to select proper test statistic. • To test whether a population mean is equal to a constant (for example, o = 70,H1 : o ≠70), we select test statistic: • When the value of test statistic is too large or too small, then we reject the null hypothesis, • The large or small values are determined by t-distribution. • The absolute value of t-value is higher, then the corresponding probability is lower.
Null hypothesis and critical value • Once we got the test statistic, we obtain null distribution (虛無分配) under the assumption that the null hypothesis is true. • For example, T(X1,…,Xn;70) is the test statistic, then the null distribution is N(0,1). • Next, we need to select a small probabilityα as the significance level (顯著水準), which is usually0.01, 0.05 or0.1.The significance level stands for a subjective probability of extreme value of the researcher. • When the null distribution and the significance level are determined, then we can find the critical value (臨界值) as the way similar to the confidence interval.
The P-value Tests of statistical significance quantify the chance of obtaining a particular random sample result if the null hypothesis were true. This quantity is the P-value. This is a way of assessing the “believability” of the null hypothesis given the evidence provided by a random sample. - For example, test of a sample mean below, do we accept the null? Parts in blue and orange Prob.=0.09 Null distribution Part in blue Prob.=0.05 Mean under null -1.645 critical value -1.34 t-value
Interpreting a P-value Could random variation alone account for the difference between the null hypothesis and observations from a random sample? • A small P-value implies that random variation because of the sampling process alone is not likely to account for the observed difference. • With a small P-value we reject H0. The true property of the population is significantly different from what was stated in H0. Thus, small P-values are strong evidence AGAINST H0. But how small is small…?
P = 0.2758 P = 0.0735 Significant P-value??? P = 0.1711 P = 0.05 P = 0.0892 P = 0.01 When the shaded area becomes very small, the probability of drawing such a sample at random gets very slim. Oftentimes, a P-value of 0.05 or less is considered significant: The phenomenon observed is unlikely to be entirely due to chance event from the random sampling.
Rejection region for a two-tailed test of µ with α = 0.05 (5%) A two-sided test means that α is spread between both tails of the curve, thus:-A middle area C of 1 −α= 95%, and-An upper tail area of α/2 = 0.025. You should remember z* for 10%, 5% and 1%. 0.025 0.025 Table C
Testing procedures • State the null hypothesisH0and the alternative hypotheses Ha. The test is designed to assess the strength of evidence against H0; Ha is the statement we will accept if we reject H0. • Select a significance level. • Calculate the test statistic. • Find the P-value for the observed data. • State a conclusion. If the P-value a reject HoIf the P-value > a accept Ho
Example 1 • {X1,X2…,Xn} arei.i.d. random variablesN(μ0, σ02) , we test H0:σ0=10, H1: σ0≠10. • This is a two-side test • Test statistic is(n – 1)Sn2/100 • Null distribution is χ2(n – 1) • Significance level 0.05 (α=0.05) • Rejection region (拒絕域):[0,8.906) ∪(32.852, ∞) • As the sample variance allocates within the rejection region, we will reject the null hypothesis.
Example 2 • {X1,X2…,Xn} arei.i.d. random variables ofN (μ0, σ02), we test H0:μ0=70, H1: μ0>70. (α=0.05) • This is an one-side test. • Test statistic: • Null distribution: N (0,1) • Rejection region:(1.645, ∞) • As the sample variance allocates within the rejection region, we will reject the null hypothesis.
Exact test and large sample test • According to the null distribution of test statistic, we have • Exact test (實際檢定 ):the null distribution is an exact distribution • Large sample test (大樣本檢定):the null distribution of the test statistic is a limiting distribution (極限分配).
Tests on population mean and variance • {X1,…,Xn} arei.i.d random variables of N (μ0, σ02), if we want to test if the population mean is equal to b, we have two test statistics upon known or unknown population variance: σ02 is known σ02 is unknown • If we test whether the population S.D. is equal to b, we use test statistic
Tests on two population means • There are two samples with i.i.d.random variables {X1,…,Xn} and{Y1,…,Ym}, with distributions N (µx,σ02) andN (µY,σ02). If we want to testH0: µx = µY , then the test statistics are σ02 is known σ02 unknown and variances are equal
Tests on two population variances • There are two samples with i.i.d.random variables {X1,…,Xn} and{Y1,…,Ym}, with distributions N (µx,σ02) andN (µY,σ02). If we want to testH0: σX2 = σY2, then the test statistic is Ifis larger than , then we can set alternative hypothesis σx2> σy2 with critical vlaue c1-α, in right tail of F(n – 1,m – 1). If is smaller than ,we use cαin the left tail of F(n – 1,m – 1) .
Large sample test • As the random variable has unknown distribution or it does not obey a normal, we cannot obtain the exact distribution under the null. We then usually drive the limiting distribution and use the critical values of the limiting distribution. • For example, the central limit theorem tells us ,
Example 3 • Survey100 students and 57 of them oppose the proposal, please test the proportion of proponents is equal to that of opponents. • Because the population distribution is not a normal, we use large sample test Upon CLT, the limiting distribution of Tn is N (0,1). Critical values of 5% and 10% confidence levels are±1.96 and ±1.645, so we do not reject the null hypothesis.
Example 4 • Survey1000 students and 570 of them oppose the proposal, please test the proportion of proponents is equal to that of opponents. • Because the population distribution is not a normal, we use large sample test Obviously, we reject the null hypothesis at 1% confidence level, a result different from Example 3.
Statistical significance and practical significance Statistical significance only says whether the effect observed is likely to be due to chance alone because of random sampling. Statistical significance may not be practically important. That’s because statistical significance doesn’t tell you about the magnitude of the effect, only that there is one. An effect could be too small to be relevant. With a large enough sample size, significance can be reached even for the tiniest effect. • A drug to lower temperature is found to reproducibly lower patient temperature by 0.4°C (P-value < 0.01). But clinical benefits of temperature reduction only appear for a 1° decrease or larger.
The power of a test The power (檢定力) of a test of hypothesis with fixed significance level α is the probability that the test will reject the null hypothesis when the alternative is true. In other words, power is the probability that the data gathered in an experiment will be sufficient to reject a wrong null hypothesis. Knowing the power of your test is important: • When designing your experiment: Select a sample size large enough to detect an effect of a magnitude you think is meaningful. • When a test found no significance: Check that your test would have had enough power to detect an effect of a magnitude you think is meaningful.
Test of hypothesis at significance level α 5%: H0: µ= 0 versus H1: µ > 0 Can an exercise program increase bone density? We assume that σ = 2 for the percent change in bone density and would consider a percent increase of 1 medically important. Is 25 subjects a large enough sample for this project? A significance level of 5% implies a lower tail of 95% and z = 1.645. Thus: All sample averages larger than 0.658 will result in rejecting the null hypothesis.
What if the null hypothesis is wrong and the true population mean is 1? The power against the alternativeµ = 1% is the probability that H0 will be rejected when in fact µ = 1%. We expect that a sample size of 25 would yield a power of 80%. A test power of 80% or more is considered good statistical practice.
Increasing the power Increase α. More conservative significance levels(lower α) yield lower power. Thus, using an α of .01 will result in lower power than using an α of .05. The size of the effect is an important factor in determining power. Larger effects are easier to detect. Increasing the sample size decreases the spread of the sampling distribution and therefore increases power. But there is a tradeoff between gain in power and the time and cost of testing a larger sample. Decrease σ.A larger variance σ2 implies a larger spread of the sampling distribution, σ/√n. Thus, the larger the variance, the lower the power. The variance is in part a property of the population, but it is possible to reduce it to some extent by carefully designing your study.
Type I and II errors • A Type I error is made when we reject the null hypothesis and the null hypothesis is actually true (incorrectly reject a true H0). The probability of making a Type I error is the significance level. • A Type II erroris made when we fail to reject the null hypothesis and the null hypothesis is false (incorrectly keep a false H0). The probability of making a Type II error is labeled .The power of a test is 1 − b.
Type I and II errors • Parts with red shadow is about Type Ierror, and the part in blue shadow is Type II error. α/2 α/2 β
Running a test of significance is a balancing act between the chance α of making a Type I error and the chance of making a Type II error. Reducing α reduces the power of a test and thus increases . It might be tempting to emphasize greater power (the more the better). • However, with “too much power” trivial effects become highly significant. • A Type II error is not definitive since a failure to reject the null hypothesis does not imply that the null hypothesis is wrong.
Type I, II errors and power function • αandβ are trade-off。 • An optimal estimator is to maximize (1-β) or minimize β given a level of α. • Power of an estimator is influenced byn (sample size), δ (difference between null and alternative), σ0 (random variable standard deviation). • When n orδgetting larger, or σ0 getting smaller, power should be larger. • When sample sizen approaches infinite, and an estimator has power equal to one for all alternative hypotheses, then this estimator is a consistent estimator.
Power function • Power function, type I error and type II error.
Power test- an example • {X1,X2…,Xn} arei.i.d. N (μ0, 1)random variables, n = 36, H0: μ0=3, H1: μ0≠3。 • If true mean is3.25, then power is • Ifn increases to64, then the power increases well.