220 likes | 311 Views
You want to survey a school. • You draw your sample from the first day of school student enrollment list This list would be your ____???____ Which students are not on this list? A phenomenon known as? Potentially problematic because? (Hint: Dillman, p. 196). Some reminders….
E N D
You want to survey a school • You draw your sample from the first day of school student enrollment list • This list would be your ____???____ • Which students are not on this list? • A phenomenon known as? • Potentially problematic because? • (Hint: Dillman, p. 196)
Some reminders… • Population: The group about whom we want to draw our inference • Sample Frame: Members of the population who could potentially be in our sample • Coverage Error: The extent to which members of population are excluded from sample frame (not good)
Welcome… • …to a hopefully productive lesson on SAMPLING METHODOLOGY! • What’s ideal? • Nifty tricks?? • Common misconceptions??? • Limitations of our methods????????? • P.S. We are going to do (some) math and it is going to be FUN!!!
Simple Random Sampling(what’s ideal) • Members of a sample frame, which hopefully includes our entire population, are selected one at a time • independently & without replacement • (Drawing names out of a hat) • Sample is equal in expectation to population on all outcomes, but no guarantees
Stratified Random Sampling(possibly even more ideal) • Use criterion to divide sample frame by group membership (e.g. racial category) • Randomly sample within each group • What is the advantage of this procedure?
Scenario… • We want to know what percentage of Americans support Obama for president • We need 1100 members from each racial group to be confident about group means (more on this later) • American Indians / Alaskan Natives comprise 1% of our population. • Through simple random sampling, how large of a sample would we theoretically need to reach n = 1100 for this subgroup?
Scenario cont’d… • OR, we could use stratified random sampling and draw 1100 from each subgroup without all this trouble. • BUT, now we have oversampled from American Indians--they are over-represented in our sample! • Implications? • Solutions?
(This data is very fake) • Proportion supporting B.O. African American: .50 Asian American: .50 Latino: .50 White: .50 American Indian: 0 Unweighted avg: ??
Weighting (nifty trick) • Now, let’s do a weighted average instead… What’s going on here? 99% (.50) + 1% (0) = 49.50% • Big difference, eh?
So, why was 1100 an ideal subgroup number? • Because no matter how large your population, a sample of 1100 will get you very close to the true population value if your outcome is binary (e.g. Obama: Yes or No) • How come?
Because this man said so • William Sealy Gossett (1876-1937) • Chemist, “math person”, Guinness Brewery worker • A patient man
Yes, a patient man • Using barley (somehow), spent two years empirically studying relationship between sample means and population means. • “The Probable (Standard) Error of a Mean” (1908) • Standard errors are what we use to estimate sampling error
Sampling error • Describes how closely our sample mean allows us to estimate our population mean • Conceptually similar to a confidence interval (Dillman, p. 207; http://www.researchsolutions.co.nz/sample_sizes.htm • Depends on: Population variance (“spread”) (estimated by sample variance) Sample size Population size (to a point)
Sampling error: big picture • Larger variances and (to a point) larger population sizes require larger samples to estimate the population mean at a given level of precision • Increasing sample size reduces sampling error, BUT there are diminishing returns to increasing our sample size
Sampling error: big picture • Diminishing Returns? For large populations… Increasing “n” from 100 to 200 is helpful Increasing from 500-600 is less helpful Increasing from 1200-1300 helps very little (no matter how large the population)
Why Diminishing Returns? • Because there is an upper bound (“ceiling”) on the variance of any sample. • For binary (Yes/no, “1” or “0”) outcomes, max variance is .25 • Thus, it’s only a matter of time till more “n” in the denominator makes our standard error very low
Why Diminishing Returns? • Even for continuous outcomes, there is still an upper bound on variance unless scale is infinite • Thus, there are still diminishing returns on increasing “n” • For more on this topic… -take S-012 -look up Confidence Intervals in stats books “You don't need a large sample of users to obtain meaningful data: Continuous Data (e.g. Task Time)” http://www.measuringusability.com/sample_continuous.htm
Limitations of Sampling error calculations • Does not take coverage error into account! • Assumes you have drawn an simple random sample (e.g. does not take “clustering” into account)
Clustering??? • There are 20,000 students in a city with 40 schools. We want a sample of 1100 • Ideally, we would draw students at random from every school. • But, it would be cheaper and easier if we drew a few schools at random and obtained information from every student • Implications?
Clustering??? • If there is a lot of school-level variation in our outcome, our sample will not be representative and our sample estimate will be biased. • Sampling error formula does not account for this possibility
One more limitation of sampling error formula • Non-response bias • Even if you have drawn a beautifully random sample, your sample estimate will be biased if those who do not return your survey are different on your outcome of interest. • That’s why Dillman’s advice on getting high response rates is so important!