Chapter 5

Chapter 5 Producing Data

5.1 Objectives • Define population and sample • Explain how sampling differs from a census • Explain what is meant by a voluntary response sample • Give an example of a voluntary response sample • Explain what is meant by convenience sampling • Define what it means for a sampling method to be biased • Define, carefully, a simple random sample • List the four steps involved in choosing a SRS • Explain what is meant by systematic random sampling • Use a table of random digits to select a simple random sample

5.1 Objectives • Define a probability sample • Given a population, determine the strata of interest, and select a stratified random sample • Define a cluster sample • Define undercoverage and nonresponse as sources of bias in sample surveys • Give an example of response bias in a survey question • Write a survey question in which the wording of the question is likely to influence the response • Identify the major advantage of large random samples

Observational Study versus Experiment • Observational Study • Observe individuals and measure variables of interest but do not attempt to influence the response. • Experiment • Deliberately impose some treatment on individuals in order to observe their responses. • A response variable measures an outcome of a study. • An explanatory variable helps explain or influences changes in a response variable.

Population versus Sample • Population • The entire group of individuals that we want information about. • Sample • A part of the population that we actually examine in order to gather information

Census…Sampling • Census • This attempts to contact every individual in the entire population • Sampling • This involves studying part in order to gain information about the whole

Why sample? • What are some reasons why we use samples instead of a census?

Example • Page 333 #5.1 Students as customers. • A committee relations in a college town plans to survey local businesses about the importance of students as customers. From telephone book listings, the committee chooses 150 businesses at random. Of these, 73 return the questionnaire mailed by the committee. • What is the population for this sample survey? • What is the sample? • What is the proportion of people that did not respond?

Types of Samples • Voluntary Response Sample • Consists of people who choose themselves by responding to a general appeal. • Biased because people with strong opinions (especially negative opinions) are most likely to respond.

Types of Samples • Convenience Sampling • Choosing individuals that are easiest to reach.

Bad Sampling Techniques

Bias • A sampling method is biased if it systematically favors certain outcomes. • A statistic is said to be unbiased if the mean of the sampling distribution equals the mean of the population.

Example • Page 334 #5.8: Explain it to the congresswoman • You are on the staff of a member of Congress who is considering a bill that would provide government-sponsored insurance for nursing home care. You report that 1128 letters have been received on the issue, of which 871 oppose the legislation. “I’m surprised that most of my constituents oppose the bill. I thought that it would be quite popular,” says the congresswoman. • Are you convinced that a majority of the voters oppose the bill? • How would you explain the statistical issue to the congresswoman?

Sampling Methods • Simple Random Sample (SRS) • A SRS of size n consists of n individuals from the population chosen in such a way that every set of size n has an equally likely chance to be the sample actually selected. • Think of putting everyone’s name in a hat and drawing.

Random Digits • A table of random digits is a long string of the digits 0 – 9 with these two properties • Each entry in the table is equally likely to be any of the 10 digits • The entries are independent of each other

Using the table to select a SRS • Step 1: Label • Assign a numerical label to every individual in the population • Step 2: Table • Use table B to select numbers at random • Step 3: Stopping Rule • Indicate when you should stop sampling • Step 4: Identify Sample • Use the labels to identify the subjects selected in the sample

Example • Page 341 #5.10 Spring Break Destinations. • A campus newspaper plans a major article on spring break destinations. The authors intend to call a few randomly chosen resorts at each destination to ask about their attitudes toward groups of students as guests. • The table contains the resorts listed in one city. The first step is to label this population as shown. • Enter Table B at line 131, and choose three resorts.

Example continued

Gettysburg Address

Sampling Methods • Probability Sample • Sample chosen by chance. • We must know what samples are possible and what chance, or probability, each possible sample has.

Sampling Methods • Stratified Random Sample • First divide the population into groups of individuals (called strata). • Individuals in each strata are similar in some way that is important to the response • Then choose a separate SRS in each stratum and combine these SRSs to form the full sample.

The River Problem

Sampling Methods • Cluster Sampling • Divide the population into groups, or clusters. • Some of these clusters are randomly selected. • All individuals in the chosen clusters are selected to be in the sample.

Cautions about Sample Surveys • Undercoverage Bias • Occurs when some groups in the population are left out of the process of choosing the sample.

Cautions about Sample Surveys • Nonresponse Bias • Occurs when an individual chosen for the sample can’t be contacted or does not cooperate.

Cautions about Sample Surveys • Response Bias • Influences caused by the behavior of the respondent or of the interviewer

GIGO • Garbage In Garbage Out • Need to learn about producing data that can be used for making valid statements. • Need to recognize techniques that produce biased samples.

Question Wording • The wording of the question is the most important influence on the answers given in a sample survey. • Confusing questions • Leading Questions • Choice of words

Ask • Insist on knowing the following before believing a survey. • The exact question asked • The rate or nonresponse • The date of the survey • The method of the survey

Samples Vary, Parameters are Fixed

5.2 Designing Experiments

5.2 Objectives • Define experimental units, subjects and treatments. • Define factor and level. • Given the number of factors and the number of levels for each factor, determine the number of treatments. • Explain the major advantage of an experiment over an observational study. • Give an example of the placebo effect. • Explain the purpose of a control group. • Explain the difference between control and a control group. • Discuss the purpose of replication, and give an example of replication in the design of an experiment. • Discuss the purpose of randomization in the design of an experiment. • Given a list of subjects, use a table of random numbers to assign individuals to treatment and control groups.

5.2 Objectives • List the three main principles of experimental design. • Explain what is means to say that an observed effect is statistically significant. • Define a completely randomized design. • For an experiment, generate an outline of a completely randomized design. • Define a block. • Give an example of a block design in an experiment. • Explain how a block design may be better than a completely randomized design. • Give an example of a matched pairs design, and explain why matched pairs are an example of block designs. • Give an example in which lack of realism negatively affects our ability to generalize the results of a study.

Vocabulary • Experimental Units • The individuals on which the experiment is done. • Subjects • When the units are humans they are called subjects. • Treatment • A specific experimental condition applied to the units is called a treatment.

Response variables measures an outcome of a study • These are also called dependent variables. • Explanatory variables help explain or influences changes in a response variable. • These are also called factors, and independent variables.

Principles of Experimental Design • Control • The purpose of control is to try to eliminate the confounding effects of lurking variables. • Replication • Randomization

Control • Placebo Effect • Control Group

Example • Page 357 # 5.35: Improving Response Rate • How can we reduce the rate of refusals in telephone surveys? Most people who answer at all listen to the interviewer’s introductory remarks and then decide whether to continue. • One study made telephone calls to randomly selected households to ask opinions about the next election. In some calls the interviewer gave her name, in others she identified the university she was representing, and in still others she identified both herself and the university. • For each type of call the interviewer either did or did not offer to send a copy of the final survey results to the person interviewed. Do these differences in the introduction affect whether the interview is completed? • Identify the experimental units or subjects, the factors, the treatments and the response variables.

Example • Page 357 # 5.37: Sham Operation • In the mid 1900s, a common treatment for angina (a disease marked by brief attacks of chest pain caused by insufficient oxygen to the heart) was called internal mammary ligation. In this procedure doctors made small incisions in the chest and tied knots in two arteries to try and increase blood flow to the heart. It was a popular procedure-90% of patients reported that it helped reduce pain. In 1960, Seattle cardiologist Dr. Leon Cobb carried out an experiment where he compared ligation with a procedure in which he made incisions but did not tie off the arteries. This sham operation proved just as successful, and the ligation procedure was abandoned as a treatment for angina. • What is the response variable in Dr. Cobb’s experiment? • Dr. Cobb showed that the sham operation was just as successful as ligation. What term do we use to describe the phenomenon that many subjects report good results from a pretend treatment? • The ligation procedure is an example of the lack of an important property of a well designed experiment. What is that property?

Principles of Experimental Design • Control • Replicate • Purpose is not to eliminate chance variation but to reduce its role and increase sensitivity of the experiment to differences between treatments. • Randomize

Principles of Experimental Design • Control • Replicate • Randomize • Use chance to assign experimental units to treatments in an attempt to balance groups according to different variables

Statistically Significant • An observed effect so large that it would rarely occur by chance is called statistically significant.

Different Experiments • Comparative Experiments: • 1) Treatment → Observation • 2) Observation → Treatment → Observation • Completely Randomized Experiments:

Group 1 Treatment 1 Measure Random Compare Allocation Results Group 2 Treatment 2 Measure Randomized Experiments

Example • Page 364 # 5.39: Treating Prostrate Disease • A large study used records from Canada’s national health care system to compare effectiveness of two ways to treat prostrate disease. The two treatments are traditional surgery and a new method that does not require surgery. The records described many patients whose doctor had chosen each method. The study showed that the patients treated by the new method were significantly more likely to die within 8 years.

Page 364 # 5.39: Treating Prostrate Disease Continued • Further study of the data showed that this conclusion was wrong. The extra deaths among patients who got the new method could be explained by lurking variables. What lurking variables might be confounded with a doctor’s choice of surgical or nonsurgical treatment? • You have 300 prostate patients who are willing to serve as subjects in an experiment to compare the two methods. Use a diagram to outline the design of a randomized comparative experiment. (when using the diagram to outline the design of an experiment, be sure to indicate the size of the treatment groups and the response variable.)

Example • Page 364 # 5.40: Headache Relief • Doctor’s identify “chronic tension-type headaches” as headaches that occur almost daily for at least 6 months. Can antidepressant medications or stress management training reduce the number and severity of these headaches? Are both together more effective than either alone? • Investigators compared four treatments: antidepressant alone, placebo alone, antidepressant plus stress management, and placebo plus stress management. • Outline the design of the experiment. The headache sufferers named in the following table have agreed to participate in the study. • Use table B line 130 to randomly assign the subjects to the treatments.

Calculator Practice • Selecting random samples by calculator 1. STAT/4:clrList/2nd/L1/ENTER. This clears any data in list L1. 2. MATH/PRB/5:randInt(a, b, c)→L1 This will choose c numbers between a and b. They will be stored in list L1.

Calculator Practice • To “seed” the calculator so that all of us get the same random numbers

Chapter 5

Chapter 5

Presentation Transcript

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5 5

chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

CHAPTER 5

Chapter 5

CHAPTER 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5