250 likes | 364 Views
Inferential Statistics 2. Maarten Buis January 11, 2006. outline. Student Recap Sampling distribution Hypotheses Type I and II errors and power testing means testing correlations. Sampling distribution. PrdV example from last lecture.
E N D
Inferential Statistics 2 Maarten Buis January 11, 2006
outline • Student Recap • Sampling distribution • Hypotheses • Type I and II errors and power • testing means • testing correlations
Sampling distribution • PrdV example from last lecture. • If H0 is true, than the population consists of 16 million persons of which 41% (=6.56 million persons) supports de PrdV. • I have drawn 100,000 random samples of 2,598 persons each and compute the average support in each sample.
Sampling distribution • 5% or 50,000 samples have a mean of 39% or less. • So if we reject H0 when we find a support of 39% or less than we will have a 5% chance of making an error. • Notice: We assume that the only reason we would make an error is random sampling error.
More precise approach • We want to know the score below which only 5% of the samples lie. • Drawing lots of random samples is a rather rough approach, an alternative approach is to use the theoretical sampling distribution. • The proportion is a mean and the sampling distribution of a mean is the normal distribution with a mean equal to the H0 and a standard deviation (called standard error) of
More precise approach • For a standard normal distribution we know the z-score below which 5% of the samples lie (Appendix 2, table A): -1.68 • So if we compute a z-score for the observed value (.31) and it is below -1.68 we can reject the H0, and we will do so wrongly in only 5% of the cases
More precise approach • m is the mean of the sampling distribution, so .41 (H0) • se is , s of a proportion is • so the se is • so the z-score is • -10.4 is less than -1.68, so we reject the H0
Null Hypothesis • A sampling distribution requires you to imagine what the population would look like if H0 is true. • This is possible if H0 is one value (41%) • This is impossible if H0 is a range (<41%) • So H0 should always contain a equal sign (either = or ≤ or ≥)
Null hypothesis • In practice the H0 is almost always 0, e.g.: • difference between two means is 0 • correlation between two variables is 0 • regression coefficient is 0 • This is so common that SPSS always assumes that this is the H0.
Undirected Alternative Hypotheses • Often we have an undirected alternative hypothesis, e.g.: • the difference between two means is not zero (could be either positive or negative) • the correlation between two variables is not zero (could be either positive or negative) • the regression coefficient is not zero (could be either
Directed alternative hypothesis • In the PrdV example we had a directed alternative hypothesis: Support for PrdV is less than 41%, since PrdV would have still participated if his support were more than 41%.
Type I error rate • You choose the type I error rate (a) • It is independent of sample size, type of alternative hypothesis, or model assumptions.
Type I versus type II error rate • a low probability of rejecting H0 when H0 is true (type I error), is obtained by: • rejecting the H0 less often, • Which also means a higher probability of not rejecting H0 when H0 is false (type II error), • In other words: a lower probability of finding a significant result when you should (power).
How to increase your power: • Lower type I error rate • Larger sample size • Use directed instead of undirected alternative hypothesis • Use more assumptions in your model (non-parametric tests make less assumptions, but are also have less power)
Testing means • What kind of hypotheses might we want to test: • Average rent of a room in Amsterdam is 300 euros • Average income of males is equal to the average income of females
Z versus t • In the PrdV example we knew everything about the sampling distribution with only an hypothesis about the mean. • In the rent example we don’t: we have to estimate the standard deviation. • This adds uncertainty, which is why we use the t distribution instead of the normal • Uncertainty declines when sample size becomes larger. • In large samples (N>30) we can use the normal.
t-distribution • It has a mean and standard error like the normal distribution. • It also has a degrees of freedom, which depends on the sample size • The larger the degrees of freedom the closer the t-distribution is to the normal distribution.
Rent example • H0: m=300, HA: m ≠ 300 • We choose a to be 5% • N = 19, so df= 18 • We reject H0 if we find a t less than -2.101 or more than 2.101 (appendix B, table 2) • We do not reject H0 if we find a t between -2.101 and 2.101 .
Rent example • We use s2 as an estimate of s2 • So • -1.85 is between -2.101 and 2.101, so we do not reject H0
Do before Monday • Read Chapter 9 and 10 • Do the “For solving Problems”