100 likes | 193 Views
STATISTICS IN A NUTSHELL. Click on image for full . pdf article. Links in article to access datasets. Study Question : Has “age at first birth” of women in the U.S. changed over time?. Statistical Inference and Hypothesis Testing. POPULATION. Women in U.S. who have given birth.
E N D
Click on image for full .pdf article • Links in article to access datasets
Study Question: Has “age at first birth” of women in the U.S. changed over time? Statistical Inference and Hypothesis Testing POPULATION Women in U.S. who have given birth “Random Variable” X = Age at first birth “Null Hypothesis” Year 2010: Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population. Present: Is μ = 25.4 still true? H0: • public education, awareness programs • socioeconomic conditions, etc. Or, is the “alternative hypothesis” HA:μ ≠ 25.4 true? That is, X~N(25.4, 1.5). standard deviation σ = 1.5 i.e., either or ? (2-sided) μ < 25.4 μ > 25.4 μ > 25.4 μ < 25.4 Does the sample statistic tend to supportH0, or refuteH0 in favor of HA? mean mean μ = 25.4 Random Sample {x1, x2, x3, x4, … , x400} FORMULA
In order to answer this question, we must account for the amount of variability of different values, from one random sample of n = 400 individuals to another. EXPERIMENT We will see three things: THEORY 95% CONFIDENCE INTERVAL FOR µ 25.453 25.747 BASED ON OUR SAMPLE DATA, the true value of μ today is between 25.453 and 25.747, with 95% “confidence” (…akin to “probability”). 95% ACCEPTANCE REGION FOR H0 25.547 25.253 IF H0 is true, then we would expect a random sample mean to lie between 25.253 and 25.547, with 95% probability. IF H0 is true, then we would expect a random sample mean that is at least 0.2 away from 25.4 (as ours was), to occur with probability .00383 (= 0.383%)… VERY RARELY!,which is less t “P-VALUE” of our sample
In order to answer this question, we must account for the amount of variability of different values, from one random sample of n = 400 individuals to another. HOW CAN WE USE ANY OR ALL OF THESE THREE OBJECTS TO TEST THE NULL HYPOTHESIS H0: µ = 30? We will see three things: 95% CONFIDENCE INTERVAL FOR µ 25.453 25.747 BASED ON OUR SAMPLE DATA, the true value of μ today is between 25.453 and 25.747, with 95% “confidence” (…akin to “probability”). 95% ACCEPTANCE REGION FOR H0 25.547 25.253 IF H0 is true, then we would expect a random sample mean that is at least 0.2 away from 25.4 (as ours was), to occur with probability .00383 (= 0.383%)… VERY RARELY!,which is less t IF H0 is true, then we would expect a random sample mean to lie between 25.253 and 25.547, with 95% probability. “P-VALUE” of our sample
In order to answer this question, we must account for the amount of variability of different values, from one random sample of n = 400 individuals to another. We will see three things: 95% CONFIDENCE INTERVAL FOR µ 25.453 25.747 BASED ON OUR SAMPLE DATA, the true value of μ today is between 25.453 and 25.747, with 95% “confidence” (…akin to “probability”). Our data value lies in the 5% REJECTION REGION. 95% ACCEPTANCE REGION FOR H0 25.547 25.253 IF H0 is true, then we would expect a random sample mean that is at least 0.2 away from 25.4 (as ours was), to occur with probability .00383 (= 0.383%)… VERY RARELY!,which is less t IF H0 is true, then we would expect a random sample mean to lie between 25.253 and 25.547, with 95% probability. Less than .05 < “P-VALUE” of our sample SIGNIFICANCE LEVEL (α)
FORMAL CONCLUSIONS: • The 95% confidence interval corresponding to our sample mean does not contain the “null value” of the population mean, μ = 25.4. • The 95% acceptance region for the null hypothesis does not contain the value of our sample mean, . • The p-value of our sample, .00383, is less than the predetermined α = .05 significance level. • Based on our sample data, we may reject the null hypothesis H0: μ = 25.4 in favor of the two-sided alternative hypothesis HA: μ ≠ 25.4, at the α = .05 significance level. • INTERPRETATION: According to the results of this study, there exists a statistically significantdifference between the mean ages at first birth in 2010 (25.4 years old) and today, at the 5% significance level. Moreover, the evidence from the sample data suggests that the population mean age today is older than in 2010, rather than younger, by about 0.2 years. In order to answer this question, we must account for the amount of variability of different values, from one random sample of n = 400 individuals to another. We will see three things: 95% CONFIDENCE INTERVAL FOR µ 25.453 25.747 BASED ON OUR SAMPLE DATA, the true value of μ today is between 25.453 and 25.747, with 95% “confidence” (…akin to “probability”). Our data value lies in the 5% REJECTION REGION. 95% ACCEPTANCE REGION FOR H0 25.547 25.253 IF H0 is true, then we would expect a random sample mean to lie between 25.253 and 25.547, with 95% probability. IF H0 is true, then we would expect a random sample mean that is at least 0.2 away from 25.4 (as ours was), to occur with probability .00383 (= 0.383%)… VERY RARELY!,which is less t Less than .05 < “P-VALUE” of our sample SIGNIFICANCE LEVEL (α)
SUMMARY:Why are these methods so important? • They help to distinguish whether or not differences between populations are statistically significant, i.e., genuine, beyond the effects of random chance. • Computationally intensive techniques that were previously intractable are now easily obtainable with modern PCs, etc. • If your particular field of study involves the collection of quantitative data, then eventually you will either: 1 - need to conduct a statistical analysis of your own, or 2 - read another investigator’s methods, results, and conclusions in a book or professional research journal. • Moral: You can run, but you can’t hide….