320 likes | 352 Views
Introduction to Sampling. “If you don’t believe in sampling, the next time you have a blood test tell the doctor to take it all.”.
E N D
Introduction to Sampling “If you don’t believe in sampling, the next time you have a blood test tell the doctor to take it all.”
To date we’ve learned to display, describe and summarize data and to examine relationships between variables; but so far we have been limited to examining data we’re given. • Now we need to use our knowledge and skills to answer questions of interest to us – by collecting our own data…
Three Important Ideas 1: Sampling: Examining part of a whole… 2: Randomization: Choosing randomly!! 3: Sample Size: It’s all about the sample size (the population size doesn’t matter).
Idea 1: Sampling • We often want to know something about a population; but examining each individual is impractical or impossible. • To combat this problem, we examine a small but representative group of individuals – called a sample – from the population.
Examples: • Think about cooking…if you want to know how your meal will taste…. • Opinion polls are samples; they’re designed to ask questions of a small group of people to learn something about the opinions of an entire population.
Types of Bias • Voluntary Response Bias: • Bias introduced when participants self-select. • Under-coverage Bias: • Introduced by a sampling method that ignores a portion of a population. Bias results in a sample that is not fully representative. • Response Bias: • Something in a survey’s design that influences responses. Includes question wording, interview techniques, etc. (video?)
Types of Bias (cont’d) • Non-Response Bias: • Bias introduced when a large portion of the target sample fails to respond…and those who do respond are likely to not be representative of the population of interest.
Bias • Bias is the bane of sampling - the one thing that must be avoided at all cost! • There is usually no way to fix a biased sample and no way to salvage useful information from it. • The easiest way to avoid bias is to select individuals for the sample at random. • The deliberate introduction of randomness to eliminate bias is one of the great insights of Statistics.
Bad Sampling Methods Convenience sampling: Just ask whoever is around. • Example: “Man on the street” survey (cheap, convenient) • BUT…Which men, and on what street? • Ask about legalizing marijuana “on the street” in Boston then in some small town in Idaho and you would probably get totally different answers. Even within an area, answers may differ. Think about this question when asked outside a church; then again outside a bar. • Bias????: Under-coverage: limited to those present.
Voluntary Response Sampling: • Samples of individuals who choose to be involved. These samples are very susceptible to bias because people are motivated (one way or another) to respond. Often called “public opinion polls.” These are not considered valid or scientific. • Bias: Sample design systematically favors a particular outcome (voluntary response bias). Example: Ann Landers summarized the responses of readers and reported that 70% of (the 10,000) parents who wrote in said that having kids wasn’t worth it. If they had to do it over again, they wouldn’t!! Bias: Most letters to newspapers are written by disgruntled people. A random sample showed that 91% of parents WOULD have kids again.
Online surveys: Bias (voluntary response): People have to care enough about an issue to bother replying. This sample is probably a combination of people who hate “wasting the taxpayers money” and “animal lovers.” Is this representative of everyone??
Sampling Terms • Population (of interest): • The group we’re interested in drawing conclusions about. • Sampling Frame: • A list of individuals from which a sample is drawn. • Target Sample: • The group that you plan (or hope!) to sample. • Sample: • The actual group you end up with when you’re done. NE Patriot Fans
Sampling Terms (cont’d) • Beads??? • Sampling Variability (“Error”): Samples drawn at random differ from one another. These differences lead to different values for the statistics we measure. • Strata: Homogeneous portions of a larger population. • Cluster: A small section of a population that represents the entire population. Pats Season Ticket Holders
Sampling Methods • Simple Random Sample (SRS) • Each person in the population of interest has an equal chance of being selected. • Stratified Sampling: • The population is divided into strata (homogeneous groups) before the target sample is selected. SRSs are then selected from each strata. • Stratified sampling can reduce sampling variability and highlight important differences between groups. Pats Season Ticket Holders
Sampling Methods (cont’d) • Cluster Sampling: • Splitting a population into clusters that represent the entire population. Once divided, several clusters are selected randomly and a census is performed within each cluster. • Multi-Stage Sampling: • Sampling methods that combine several other methods are called multi-stage samples. (web example / handout). Pats Season Ticket Holders
Sampling Methods (cont’d) • Systematic Sampling: • Sampling method when individuals are selected systematically from a sampling frame; starting point must be generated randomly. • Pilot: • Small trial run of a survey to check whether questions are clear; allowing elimination of bias and question corrections. Pats Season Ticket Holders
Sampling Methods (example) • The Principal is interested in student opinions on the school’s attendance policy… • Population of Interest? • Simple Random Sample (SRS) • Sampling Frame? Target Sample? • Sample? Problems? • Systematic Sample • Sampling Frame? Target Sample? • Sample? Problems? PMVHS Attendance Policy
Sampling Methods (example) • The Principal is interested in student opinions of the school’s attendance policy… • Population of Interest? • Cluster Sample • Clusters? Target Sample? • Sample? Problems? • Stratified Sample • Strata? Sampling Frame? • Target Sample? Sample? • Problems? PMVHS Attendance Policy
Sampling Methods (example) • The Principal is interested in student opinions of the school’s attendance policy… • Population of Interest? • Multi-Stage Sample • Target Sample? • Sample? • Problems? • Which is Best Method?? • IT DEPENDS!! PMVHS Attendance Policy
Three Important Ideas 1: Sampling: Examining part of a whole… 2: Randomization: Choosing randomly!! 3: Sample Size: It’s all about the sample size (the population size doesn’t matter).
Idea 2: Randomization Randomization minimizes bias.
Idea 3: It’s all about the Sample Size • It’s the size of the sample, not the size of the population or the proportion of the population you’ve sampled that matters.
Which Survey is Most Accurate? 1) In the city of Peabody, 1,000 likely voters are randomly selected and asked who they are going to vote for in the Peabody mayoral race. 2) In the state of Massachusetts, 1,000 likely voters are randomly selected and asked who they are going to vote for in the Massachusetts Governor's race. 3) In the United States, 1,000 likely voters are randomly selected and asked who they are going to vote for in the presidential election. Answer: All the surveys have the same accuracy.
Does a Census Make Sense? • Why bother with sampling…worrying about sample size, bias, etc.? • Wouldn’t it be better to include everyone? To “sample” an entire population? • Well….sometimes! (example) • Such a special sample is called a census.
Does a Census Make Sense? (cont.) • There are problems with taking a census: • Practicality: It can be difficult to complete a census—individuals can be hard to locate... • Timeliness: populations don’t stand still. Even if you could take a census, the population changes while you work. • Expense: taking a census is much more expensive than sampling. U.S. Census??? • Accuracy: a census may not be as accurate as a good sample: (data entry errors, tedium, etc.) $14.7 billion
Population: The entire group of individuals we are interested in but can’t get to directly. Examples: All humans, all working-age people in New England, all crickets, all h/s students. A parameter is a number describing a characteristic of the population. Sample: The part of the population we actually examine (for which we do have data). A statistic is a number describing a characteristic of a sample. Population vs. Sample Population Sample
Greek Latin
Various claims are made about surveys. Why are each of these not correct? • It is always better to take a census than a sample… • Timeliness, expense, complexity, accuracy • Stopping students on their way out of the cafeteria food line is a good way to sample if we want to know the quality of the food in the cafeteria. • Bias; they chose to eat at the cafeteria
An internet poll taken at the website (www.statsisfun.org) garnered 12,357 responses. The majority of the respondents said they enjoyed doing statistics homework. With a sample size so large, we can be pretty sure that most stats students feel this way too. • Voluntary response bias; size of sample does not remove the bias.
Lots of New Vocabulary!!! Population Multistage Sample Bias Voluntary Response Bias Sample Sample size Pilot Cluster Sample Response bias Systematic Sample Census Sampling Variability Sampling Frame Population Parameter Undercoverage bias SRS Stratified Random Sample Convenience Sample Nonresponse bias
Vocabulary Review Population Multistage Sample Bias Voluntary Response Bias Sample Sample size Pilot Cluster Sample Response bias Systematic Sample Census Sampling Variability Sampling Frame Population Parameter Undercoverage bias SRS Stratified Random Sample Convenience Sample Nonresponse bias