LT 4.1—Sampling and Surveys Day 3 Notes--Bias

LT 4.1—Sampling and SurveysDay 3 Notes--Bias

Population and Sample • Population: The collection of all individuals or items under consideration in a statistical study. • The population is determined by what we want to know. • Sample: That part of the population from which information is obtained. • The sample is determined by what is practical and should be representative of the population.

Sampling Designs • The sampling design is the method used to chose the sample. • All statistical sampling designs incorporate the idea that chance (randomness), rather than choice, is used to select the sample. • The value of deliberately introducing randomness is one of the great insights of Statistics. • Randomizing protects us from the influences of all the features of our population, even ones that we may not have thought about. It does that by making sure that on the average the sample looks like the rest of the population.

The Valid Survey • It isn’t sufficient to just draw a sample and start asking questions. A valid survey yields the information we are seeking about the population we are interested in. Before you set out to survey, ask yourself: • What do I want to know? • Am I asking the right respondents? • Am I asking the right questions? • What would I do with the answers if I had them; would they address the things I want to know?

Sampling Designs • Simple Random Sample (SRS) • Stratified Sample • Cluster Sample • Systematic Sample • Multistage Sample • Convenience Sample • Voluntary Response Sample

Bias

Bias • Definition: Any systematic failure of a sample to represent its population. • Sampling methods that, by their nature, tend to over- or under-emphasize some characteristics of the population are said to be biased. • Bias is the bane of sampling—the one thing above all to avoid. • There is usually no way to fix a biased sample and no way to salvage useful information from it. • The best way to avoid bias is to select individuals for the sample at random. • The value of deliberately introducing randomness is one of the great insights of Statistics.

Types of Bias • Undercoverage • Voluntary Response Bias • Convenience Sample Bias • Nonresponse Bias • Response Bias

Undercoverage • A sampling scheme that fails to sample part of the population or that gives a part of the population less representation than it has in the population suffers from undercoverage. • A classic example of undercoverage is the Literary Digest voter survey, which predicted that Alfred Landon would beat Franklin Roosevelt in the 1936 presidential election. The survey sample suffered from undercoverage of low-income voters, who tended to be Democrats. Undercoverage is often a problem with convenience samples.

Example: Literary Digest Poll • 1936 presidential election Literary Digest magazine poll. • The survey team asked a sample of the voting population whether they would vote for Franklin D. Roosevelt, the democratic candidate or Alfred Landon, the republican candidate. • Based on the results, the magazine predicted an easy win for Landon.

Results • When the actual results were in, Roosevelt won by a landslide. • What happened? • The sample was obtained from among people who owned a car or had a telephone. In 1936, that group included mostly rich people and they historically voted republican. • The response rate was low, less than 25% of those polled responded. A disproportionate number of those responding were Landon supporters. • Whatever the reason for the poll’s failure, the sample was not representative of the population.

Voluntary Response Bias • When choice rather than randomization is used to obtain a sample, the sample suffers from voluntary response bias. • Voluntary response bias occurs when sample members are self-selected volunteers. • An example would be call-in radio shows that solicit audience participation in surveys on controversial topics (abortion, affirmative action, gun control, etc.). The resulting sample tends to over represent individuals who have strong opinions.

Convenience Sample • Is obtained exactly as its name suggests, by sampling individuals who are conveniently available. Convenience samples are often not representative of the population of interest because each individual in the population is not equally convenient to sample. • The classic example of a convenience sample is standing at a shopping mall and selecting shoppers as they walk by to fill out a survey.

Nonresponse Bias • Occurs in a sample design when individuals selected for the sample fail to respond, cannot be contacted, or decline to participate. • A common problem with mail surveys. Response rate is often low (5% - 30%), making mail surveys vulnerable to nonresponse bias.

Response Bias • Anything in a survey that influences responses falls under the heading of response bias. • Examples are biased wording of survey questions, lack of privacy while being surveyed, and appearance of the interviewer. • Both Question Bias and Interviewer Bias are examples of response bias.

Response Bias: Question Bias • Wording of the questions or the questions themselves lead to bias. • People often don’t want to be perceived as having unpopular or unsavory views and so may not respond truthfully. • Example: Given that the threat of nuclear war is higher now than it has ever been in human history, and the fact that a nuclear war poses a threat to the very existence of the human race, would you favor an all-out nuclear test ban? • Question is biased in favor of a nuclear test ban.

Response Bias: Question Bias

Response Bias: Interviewer Bias • The sex, age, race, dress, attitude, or actions of the interviewer and how the interviewer asks the questions have an influence on the way a subject responds. • Example: A male interviewer asking sex related questions to women. • To prevent this, interviewers must be trained to remain neutral throughout the interview. They must also pay close attention to the way they ask each question. If an interviewer changes the way a question is worded, it may impact the respondent's answer.

Sampling Variability • Sampling Variability • Is the natural tendency of randomly drawn samples to differ, one from another. • Sampling variability is not an error, just the natural result of random sampling. • Statistics attempts to minimize, control, and understand variability so that informed decisions can drawn from the data despite their variation. • Although samples vary, when we use chance to select them, they do not vary haphazardly but rather according to the laws of probability.

Example: Sample Variability • Each of four major news organizations surveys likely voters and separately reports that the percentage favoring the incumbent candidate is 53.5%, 54.1%, 52%, and 54.2%, respectively. • What is the correct percentage? • Did three or more of the news organizations make a mistake?

Solution • There is no way of knowing the correct population percentage from the information given. • The four surveys led to four statistics, each an estimate of the population parameter. • No one made a mistake unless there was a bad survey. • Sampling variation is natural.

Stratified Sampling

Cluster Sampling

Systematic Samples

What Can Go Wrong?—or,How to Sample Badly • Sample Badly with Volunteers: • In a voluntary response sample, a large group of individuals is invited to respond, and all who do respond are counted. • Voluntary response samples are almost always biased, and so conclusions drawn from them are almost always wrong. • Voluntary response samples are often biased toward those with strong opinions or those who are strongly motivated. • Since the sample is not representative, the resulting voluntary response bias invalidates the survey.

What Can Go Wrong?—or,How to Sample Badly • Sample Badly, but Conveniently: • In convenience sampling, we simply include the individuals who are convenient. • Unfortunately, this group may not be representative of the population. • Convenience sampling is not only a problem for students or other beginning samplers. • In fact, it is a widespread problem in the business world—the easiest people for a company to sample are its own customers.

What Can Go Wrong?—or,How to Sample Badly • Sample from a Bad Sampling Frame: • An SRS from an incomplete sampling frame introduces bias because the individuals included may differ from the ones not in the frame. • Undercoverage: • Many of these bad survey designs suffer from undercoverage, in which some portion of the population is not sampled at all or has a smaller representation in the sample than it has in the population. • Undercoverage can arise for a number of reasons, but it’s always a potential source of bias.

What Else Can Go Wrong? • Watch out for nonrespondents. • A common and serious potential source of bias for most surveys is nonresponse bias. • No survey succeeds in getting responses from everyone. • The problem is that those who don’t respond may differ from those who do. • And they may differ on just the variables we care about.

What Else Can Go Wrong? • Work hard to avoid influencing responses. • Response bias refers to anything in the survey design that influences the responses. • For example, the wording of a question can influence the responses: • Given the fact that those who understand Statistics are smarter and better looking than those who don’t, don’t you think it is important to take a course in Statistics?

How to Think About Biases • Look for biases in any survey you encounter before you collect the data—there’s no way to recover from a biased sample of a survey that asks biased questions. • Spend your time and resources reducing biases. • If you possibly can, pilot-test your survey. • Always report your sampling methods in detail.

What have we learned? • A representative sample can offer us important insights about populations. • It’s the size of the sample, not its fraction of the larger population, that determines the precision of the statistics it yields. • There are several ways to draw samples, all based on the power of randomness to make them representative of the population of interest: • Simple Random Sample, Stratified Sample, Cluster Sample, Systematic Sample, Multistage Sample

What have we learned? • Bias can destroy our ability to gain insights from our sample: • Nonresponse bias can arise when sampled individuals will not or cannot respond. • Response bias arises when respondents’ answers might be affected by external influences, such as question wording or interviewer behavior.

What have we learned? • Bias can also arise from poor sampling methods: • Voluntary response samples are almost always biased and should be avoided and distrusted. • Convenience samples are likely to be flawed for similar reasons. • Even with a reasonable design, sample frames may not be representative. • Undercoverage occurs when individuals from a subgroup of the population are selected less often than they should be.

What have we learned? • Finally, we must look for biases in any survey we find and be sure to report our methods whenever we perform a survey so that others can evaluate the fairness and accuracy of our results.

LT 4.1—Sampling and Surveys Day 3 Notes--Bias