Chapter 5: Producing Data

Chapter 5:Producing Data “An approximate answer to the right question is worth a good deal more than the exact answer to an approximate question.’ John Tukey

5.1 Designing Samples (p. 245-261)(Overview) • One must design the sampling process very carefully in order to obtain reliable statistical information. • Meaningful and useful results can be produced by good sampling techniques, many of which involve the use of chance. • Worthless data is produced by bad sampling techniques.

Definitions • Voluntary response sample • Consists of people who chose themselves. • Example: Listeners who call in to respond to a talk show question • Two variables are confounded when their effects on a response variable cannot be distinguished from one another. • See Example 5.2 in textbook in which the explanatory variable (the reading of favorable propaganda) and the events of history are confounded.

Definitions (cont’d.) • Statistical Inference: provides ways to provide “reasonable” responses to specific questions by examining data. • Population: group from which information is desired. • Sample: part of the population that is examined in an attempt to obtain information about the population.

Definitions (cont’d.) • Sampling Frame: the list of individuals from which a sample is actually selected. • Example: • Population: adult residents of Delaware County • Sampling Frame: voter registration roll • Design: the method that is used to select the sample.

Definitions (cont’d.) • Convenience Sample: selecting individuals that are easiest to reach. • Examples: • Opinions offered by shoppers entering or leaving a WaWa or Borders in Springfield (used by Daily Times) • Opinions offered by students of a Catholic school( used by Catholic Standard and Times) • Biased Sample: sample that has been systematically chosen because of favoritism of a specific outcome.

Definitions (cont’d.) • Simple random sample (SRS) of size n: sample that is chosen is such a way that every set of n individuals has an equal chance of being selected to be included in it. • Sometimes this is easier said than done! It can be tricky to obtain an SRS. • Probability sample: each member of the population is given a known chance of being chosen.

Definitions (cont’d.) • Stratified Random Sample: • Steps: • Population is divided into groups called strata • A SRS is chosen from each strata • SRS’s are combined into one sample • Reasons: • To reduce the variation of the estimators • Administrative convenience • Less expensive • Estimates need “subgroups” of population

Definitions (cont’d.) • Multi stage sample design: the selection of smaller groups within a population by stages. • Undercoverage occurs when some groups in the population are left out in the process of choosing the sample. • Nonresponse occurs when an individual cannot be contacted or refuses to cooperate. • Response bias refers to a variety of things that can lead to an incorrect or false response.

Final Thoughts: • The wording of the question can greatly influence the response. • A poorly worded question can confuse those who are attempting to answer it.

5.2 Designing Experiments (p. 265-284)Am Overview • There are good and bad techniques for producing data. • Important and effective statistical practices are the use of random sampling and randomized comparative experiments. • The use of chance is vital in statistical design.

Concepts and Definitions • In an observational study, NOtreatmentis imposed on the individuals in the study. • Variables of interest are measured, usually over a period of time. • In an experiment, treatment is imposed on the individuals in thestudy. • Responses to the treatment are observed.

Definitions (cont’d.) • Experimental units are individuals on which the experiment is performed. • i.e. participants in the experiment • A treatment is a specific experimental condition that is applied to the experimental units. • A placebo is a dummy treatment that can have no physical effect on an experimental unit. • Commonly called a “sugar pill.”

Definitions (cont’d.) • The control group receives the placebo. • This group helps the experimenter to control the effects of any lurking variables. • The treatment group receives the treatment.

Definitions (cont’d.) • Completely randomized experimental design: All experimental units are allocated at random among the treatments • Statistically significant observation: An observed result that is too unusual to be an outcome determined by pure chance.

Three Principals of Experimental Design • CONTROL • Needed to counter the effects of lurking variables. • Comparison is the simplest form of control. • Experiments should compare two or more treatments in order to avoid confounding the effect of the treatment with some other influence. • RANDOMIZATION • Subjects are assigned treatments by pure chance. • Creates groups that are similar (except for chance variation) • Table of random digits can be used to choose the uits for each group • REPLICATION • Experiment should be done on many subjects to reduce any chance variation in the results.

Definitions (cont’d.) • In a double blind experiment, neither the subjects nor the people who have contact with them know which treatment a subject is receiving. • A block design • Minimizes variation. • Block: group of experimental units or subjects that are similar in ways that are expected to affect the response of the treatments. • Treatment is assigned randomly within similar blocks. • A form of control.

Definitions (cont’d.) • Matched pairs: • Common form of blocking • Compares two treatments • The pairs are “alike” • Common forms: • Using random process • In pair, one receives treatment, other receives placebo • Pairs are observed at a later time to see if treatment had any effect • Test scores from a before-after situation • Individual • Takes a before-test • Receives some type of treatment • Takes an after-test • Purpose: to see if treatment improves test performance

5.3 Simulation Experiments (p. 286-296)An Overview • Empirical probabilities relating to real-life can be obtained • Chance outcomes can be imitated by using • Random number generators • Tables • Calculators • Computers • Dice • Cards • Spinners

Simulation • The imitation of chance behavior in an attempt to gain information about a real-life situation randInt(can be used on your TI-84 plus to generate random integers

Steps in Creating a Simulation Model • State the problem or describe the experiment. • State the assumptions. • Assign digits to represent outcomes. • Simulate your conclusions. • State your conclusions.

When Trials are Completed • Determine empirical probability by calculating the ratios • Number of situations in which you are interested divided by the total number of trials.

Chapter 5: Producing Data