Chapter 5: Producing Data

Chapter 5: Producing Data Section 5.1 – Designing Samples

Introduction • Our goal in choosing a sample is to obtain a picture of the population that is disturbed as little as possible by the act of gathering information. • Sample surveys are one kind of observational study. • In other settings, we gather data from an experiment.

Observation vs. Experiment • An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. • An experiment, on the other hand, deliberately imposes some treatment on individuals in order to observe their responses. • Both have important roles depending on the situation and the questions to be answered. • See example 5.1 on p.270

Additional facts • Observational studies of the effect of one variable on another often fail because the explanatory variable is confounded with lurking variables. • In some situations, it may not be possible to observe individuals directly or to perform an experiment. In other cases, it may be logistically difficult or simply inconvenient.

Population and Sample • The entire group of individuals that we want information about is called the population. • A sample is a part of the population that we actually examine in order to gather information.

Sampling vs. Census • Sampling involves studying a part in order to gain information about the whole. • A census attempts to contact every individual in the entire population. • The design of a sample refers to the method used to choose the sample from the population. • Poor sample design can produce misleading conclusions.

Voluntary Response Sample • A voluntary response sample consists of people who choose themselves by responding to a general appeal. • Voluntary response samples are biased because people with strong opinions, especially negative opinions, are most likely to respond. • See example 5.2 on p.272 • Another type of bad sampling is convenience sampling, which chooses the individuals easiest to reach. • See example 5.3 on p.272

Bias • The design of a study is biased if it systematically favors certain outcomes. • Choosing a sample by chance attacks bias by giving all individuals an equal chance to be chosen.

Simple Random Sample • A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. • An SRS not only gives each individual an equal chance to be chosen, (thus avoiding bias in the choice) but also gives every possible sample an equal chance to be chosen. • Not all random samples are simple random samples: • If I have 10 names in ahat, the probability of any one name being drawn is 1 out of 10. After the first name is drawn, there are nine names left in the hat changing the probability of anyone being selected as the second name to be 1 out of 9. Since different names have different probabilities depending on the sequence in which the drawing is done, the resulting sample will not be a simple random sample.

Random Digits • A table of random digits is a long string of the digits 0,1,2,3,4,5,6,7,8,9 with the following two properties: • Each entry in the table is equally likely to be any of the 10 digits 0 through 9. • The entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part. *Table B (in the back of your textbook) is a Random Digit Table

Choosing an SRS Choose an SRS in two steps: • Table: Use Table B to select labels at random. • Label: Assign a numerical label to every individual in the population. • See example 5.4 on p.276

Probability Sample • A probability sample is a sample chosen by chance. We must know what samples are possible and what chance, or probability, each possible sample has. • Systematic sampling is a type of probability sampling in which the sampling starts by selecting an element from the list at random and then every kth element in the frame is selected to be part of the sample. • The use of chance to select the sample is the essential principal of statistical sampling.

Stratified Random Sample • To select a stratified random sample, first divide the population into groups of similar individuals, called strata. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample. • This method is usually used for sampling from large populations spread out over a wide area.

Cluster Sampling • With cluster sampling, the researcher divides the population into separate groups, called clusters. Then, a simple random sample of clusters is selected from the population and ALL members in the cluster become part of the sample. • For example, if a researcher is studying the attitudes of Catholic Church members surrounding the recent exposure of sex scandals in the Catholic Church, he or she might first sample a list of Catholic churches across the country. Let’s say that the researcher selected 50 Catholic Churches across the United States. He or she would then survey all church members from those 50 churches.

Multistage Sampling • A typical example of multistage sampling is Current Population Survey Sampling Design, which is conducted as follows: • Stage 1: Divide the United States into 2007 geographical areas called Primary Sampling Units. • Stage 2: Divide each PSU selected into smaller areas called “neighborhoods” using ethnic and other information and take a stratified sample of neighborhood. • Stage 3: Sort the housing units in each neighborhood into clusters of four nearby units. Interview the households in a random sample of these clusters. • This method saves time and money.

Cautions about sample surveys • We need a complete and accurate list of the population. • Undercoverageoccurs when some groups in the population are left out of the process of choosing the sample. • See example 5.6 on p.281 • Nonresponseoccurs when an individual chosen for the sample can’t be contacted or does not cooperate. • Response bias is when respondents lie, especially if asked about illegal or unpopular behaviors. • An interviewer whose attitude suggests that some answers are more desirable than others will get these answers more often. • The wording of questions is the most important influence on the answers given to a sample survey. • See example 5.7 on p.282

Inference about the population • Using chance to choose a sample eliminates bias in the actual selection of the sample. • Because we deliberately use chance, the results obey the laws of probability that govern chance behavior. • Larger random samples give more accurate results than smaller samples.

Homework:

Chapter 5: Producing Data