120 likes | 334 Views
Review. You’re hired to test whether a company’s hiring practices are gender-biased. You record whether each new hire is male or female. What’s the null hypothesis? It is impossible to know the company’s true hiring preferences Men and women are equally likely to be hired
E N D
Review You’re hired to test whether a company’s hiring practices are gender-biased. You record whether each new hire is male or female. What’s the null hypothesis? • It is impossible to know the company’s true hiring preferences • Men and women are equally likely to be hired • Women are more likely to be hired • Men are more likely to be hired
Review What would be a Type I Error for this experiment? • Concluding the company is biased when it isn’t • Concluding the company isn’t biased when it is • Correctly concluding the company is biased • Correctly concluding the company is unbiased
Review You decide in advance to record the first 8 new employees. Here are the probabilities of what will happen, according to the null hypothesis: If your alpha level is 5%, how many of the hires need to be male for you to conclude the company is biased toward hiring men? • 0.40 • 6 • 7 • 7.95 Number of men 0 1 2 3 4 5 6 7 8 Probability .00 .03 .11 .22 .27 .22 .11 .03 .00
Sampling Distributions 10/03
Same Thought Experiment • Known population • Sample n members • Compute some statistic • What is probability distribution of the statistic? • Replication • Doing exactly the same experiment but with a new sample • Sampling variability means each replication will result in different value of statistic
Sampling Distributions • Sampling distribution • The probability distribution of some statistic over repeated replication of an experiment • Distribution of sample means: the probability distribution for M Sampling Distribution Population Sample(s) X = [ -.70 1.10 -.36 -.68 -.08], M = -.144 X = [ .09 -.88 1.16 -1.72 .40], M = -.019 X = [ -.99 .47 .65 1.52 .20], M = .370 X = [ -.84 -2.06 1.06 -.24 2.49], M = .082 X = [ 1.88 .57 .38 .16 .85], M = .768
2sM 2sM m m m M Reliability of the Sample Mean • How close is M to m? • Tells how much we can rely on M as estimator of m • Standard Error (SE or sM) • Typical distance from M to m • Standard deviation of p(M) • Estimating m from M • If we know M, we can assume m is within about 2 SEs • Ifm were further, we probably wouldn’t have gotten this value of M • SE determines reliability • Low SE high reliability; high SE low reliability • Depends on sample size and variability of individual scores • Law of Large Numbers • The larger the sample, the closer M will be to m • Formally: as n goes to infinity, SE goes to 0 • Implication: more data means more reliability
Central Limit Theorem • Characterizes distribution of sample mean • Deep mathematical result • Works for any population distribution • Three properties of p(M) • Mean: The mean of p(M) always equals m • Standard error: The standard deviation of p(M), sM, equals • Variance equals • Shape: As n gets large, p(M) approaches a Normal distribution • All in one equation: • Only Normal if n is large enough! • Rule of thumb: Normal if n 30
Distribution of Sample Variances • Same story for s as for M • Probability distribution for s over repeated replication • Chi-square distribution (c2) • Probability distribution for sample variance • Positive skew; variance sensitive to outliers • Mean equals s2, because s2 is unbiased • Recall: • Distribution of ?
Review The shape of the distribution of sample means • Is always normal • Is approximately normal when sample size is big enough • Is always chi-square • Depends on whether you divide by n or by n – 1
Review As the size of a sample increases • The expected value of M gets closer to µ • The expected value of M gets further from µ • The standard error of M gets smaller • The standard error of M gets larger
Review A population has a mean of µ = 100 and a standard deviation of s = 10. 400 researchers each measure a sample of 25 people. Everyone then reports their sample means. What should be the standard deviation of the 400 sample means? • 0.2 • 0.5 • 10 • 20 Correct answer is 2