510 likes | 663 Views
Producing Data: Samples and Experiments. Chapter 5. Simple Random Sample. number the population use a method to randomly select the desired sample size from entire population Advantages: every member of population always has equal chance of being selected
E N D
Producing Data: Samples and Experiments Chapter 5
Simple Random Sample • number the population • use a method to randomly select the desired sample size from entire population Advantages: every member of population always has equal chance of being selected Disadvantages: sample may not be representative of population; difficult with large populations
Cluster Random Sample • divide population into clusters • use a method to randomly select one or more clusters • use a method to randomly select from the chosen clusters Advantages: can work well if population is easy to divide or there are established clusters Disadvantages: not everyone has equal chance of being chosen; selected clusters may not be representative of population
Stratified Random Sample • divide population into strata • use a method to randomly select a sample from each strata Advantages: guarantees representation from each strata Disadvantages: not everyone has equal chance of being chosen; strata (of interest) may be difficult to determine; population may be difficult/laborious to sort
Systematic Random Sample • use sample size and population size to determine (estimate) “magic number” • use a method to randomly select number using “magic number” as range; add to determine corresponding selections Advantages: allows rapid method to select from large population; helps provide representation throughout population Disadvantages: not everyone has equal chance of being chosen; sample may not be representative
Multi-Stage Random Sample • use a method (SRS, cluster, stratified) to randomly select (large) groups • use a method (SRS, cluster, stratified to randomly select (smaller) groups • repeat until participants are chosen
Role of Sampling Design • Statistical inference provides ways to answer specific questions from data with some guarantee that the answers are good ones. • Statistical inference will be inaccurate if the method of collecting data is flawed.
Other Sampling Designs • Suppose Mr. Padavil is interested in finding out if Hendrickson students think more trees should be planted. He makes an announcement and instructs students to come by his office to let him know if tree planting is an issue they support. Will this sample of students give him an accurate picture of all students feelings at Hendrickson?
Other Sampling Designs • A voluntary response sample consists of people who choose themselves by responding to a general appeal. • Voluntary response samples OVER REPRESENTS people with strong opinions. • This means the true population proportion will be lower than the sample proportion.
Other Sampling Designs • Mr. Padavil is surprised to find most of the students coming in his office are in favor of the tree planting. Feeling that maybe his design may not have worked, he ventures into the hallways during passing periods and starts asking students randomly. Will this sample of students give him an accurate picture of all students feelings at Hendrickson?
Other Sampling Designs • A convenience sample consists of people who are chosen because its easier and convenient to pick them. • Convenience samples do not give every member of a population an equal chance of being chosen.
Defining Important Terms • population: the entire group of individuals we want information about. • sample: a part of the population that we actually examine in order to gather information. • sample design • good: simple random sample, cluster, stratified, systematic • poor: voluntary response, convenience sampling • bias : to systematically favor an outcome. • A poor sample design will systematically favor certain outcomes or results.
A sociologist wants to know the opinions of employed adult women about government funding for day care. She obtains a list of the 520 members of a local business and professional women’s club in Dallas and mails a questionnaire to 100 of these women selected at random. Only 48 questionnaires are returned. • What is the population in this study? • What is the sample?
simple random sample convenience sample cluster sample voluntary response systematic sample stratified sample McCallum seniors UT alumni Blender magazine subscribers Texans national pet stores Austin middle students Random-random sample practice
Cautions about sample surveys • Suppose we use a random sample in a survey, what could confound our results? • undercoverage • the issue occurs when a sampling design misses a part of the population • nonresponse • the issue occurs when a significant part of the population refuses to participate in the survey
Cautions about sample surveys • response bias • the issue occurs when the person asking the question makes the respondent uncomfortable and possibly influence their answer • wording of questions • the issue occurs when a question is leading and attempts to persuade a respondent toward a particular answer • Remember: sample results sometimes simply do not necessarily match the population.
Identify potential problems • To obtain a sample of households, a television rating service dials phone numbers taken at random from a telephone directory.
Identify potential problems • Sports Illustrated magazine sent a mail-in questionnaire to 500 randomly selected subscribers. One of the questions was the following: “Knowing that the cover price would likely increase, would you prefer the number of advertisements in the magazine to be limited?”
Identify potential problems • For a survey of student opinions about high school athletic programs, a member of the school board obtains a random sample of students by listing all high school students and using a random number table to select 30 of them. After making phone calls last weekend, she notes six of the students said that they didn’t have time to participate in the survey.
You are on the staff of a member of Congress who is considering a bill that would provide government-sponsored insurance for nursing home care. You report to her that 1128 letters have been received on the issue, of which 871 oppose the legislation. “I’m surprised that most of my constituents oppose the bill. I thought it would be quite popular,” says the congresswoman. • Are you convinced that a majority of the voters oppose the bill? • How would you explain the statistical issue to the congresswoman?
Role of mathematics in sampling • Results will differ from sample to sample. This phenomenon is called sampling variability. • Since we deliberately use chance, the results obey the laws of probability allowing fairly consistent results (within a margin of error). • The degree of accuracy can be improved by increasing the size of the sample.
Designing Experiments: vocab Vocabulary shift from algebra to statistics algebrastatistics • Independent Explanatory variable • Dependent Response variable • Explanatory variable also called a “factor.”
Example for vocabulary check • A corporation found that technology trainings were often stressful to their employees. One idea was to play background music (jazz or classical). Another idea was to have the presenter and participants dress casual rather than the usual business attire. Equivalent technology trainings over the next year were randomly assigned a particular condition. A post training survey was given to measure the stress associated with each training.
Example for vocabulary check:State the Factors, Levels and # of treatments Factors: music, attire Levels: music (3), attire (2) Treatments: 6
Discussion example 1 • One school board member noticed that students in band tended to be in the top 25% of their school. She compiled a list from each high school’s band director and took a random sample of 25 students from each school’s band. She then took a random sample of 25 students from each high school that wasn’t in band. She found a slightly higher average G.P.A. of student’s in band.
Discussion example 1 • Will this study give evidence that being in band causes an increase in a students G.P.A? • Will this study help her generalize that student’s in band tend to have a slightly higher G.P.A. than students not in band?
Vocabulary from example 1 • Observational study • a study based on data collected from individuals that meet a determined criteria • Lurking variable • an outside factor that is not the explanatory nor response variable • prevents causal relationships from being established in observational studies
Discussion example 2 • Walmart is considering buying a gasoline additive that is suppose to improve gas mileage. They found 30 employees in Texas that drive the same car. Fifteen employees are randomly selected to receive the additive, the remaining fifteen are given a bottle with just gas. Each employee is given a set route around the city to drive. The gas mileage is recorded by an onboard computer which shows the additive gives the driver 12% better gas mileage.
Discussion example 2 • Will this study give evidence that using the additive will give a car better gas mileage?
Vocabulary from example 2 • Experiment • a planned study where deliberate conditions are imposed to see how the response variable will change • Confounding variable • a variable associated (noncausal) with the explanatory variable that affects the response variable in some way • makes it difficult to tell if the treatment or the confounding variable affected the response variable significantly
Lurking versus confounding Observation study Experiment ? x y x y ? z z Lurking Confounding
Example 3 • A baby-food producer claims that her product is superior to that of her leading competitor, in that babies gain weight faster with her product. For the experiment, 30 healthy babies are randomly selected. • Using a diagram, outline an experiment.
Completely Randomized Design Group 1 15 babies Treatment 1 Her product Compare weight gain Random Allocation Group 2 15 babies Treatment 2 Competitor’s Babies will be numbered 01 to 30. Using a random number table taking two digits at a time, the first 15 selected will be in Group 1 with the remaining placed in group 2. Each babies’ weight will be measured in pounds and compared.
Example 4 • We wish to determine whether or not a new type of fertilizer is more effective than the type currently in use. Researchers have subdivided a 20-acre farm into twenty 1-acre plots. Wheat will be planted on the farm, and at the end of the growing season the number of bushels harvested will be measured. • Produce a diagram of the experiment.
Group 1 10 acres Treatment 1 New type Compare bushels Random Allocation Group 2 10 acres Treatment 2 Current Completely Randomized Design Land plots will be numbered 01 to 20. Using a random number table taking two digits at a time, the first 10 selected will be in Group 1 with the remaining placed in group 2. The bushels of wheat from each plot will be counted and compared.
An example of a good design? • In order to test the effectiveness of nicotine patches, Dr. Hurt recruited 240 smokers at various locations. Volunteers were to receive a 22-mg nicotine patch for eight weeks. Almost half (46%) of the nicotine group had quit smoking at the end of the study. • Confounding variable: placebo effect
Principles of Experimental Design • Control: using comparison ensures that outside factors other than the experimental treatments operate equally on all groups. • Randomization: use of impersonal chance in order equalize unanticipated factors so that groups that should be similar in all respects. • Replication: perform the experiment on as many subjects to reduce chance variation in the results.
Randomized Comparative Experiments • Goal of an experiment: collect statistically significant evidence for a cause-and-effect relationship. • The success of an experiment depends on our ability to treat all the experimental units identically except for the actual treatment.
Design Example 5 • You are participating in the design of a medical experiment to investigate whether a calcium supplement in the diet will reduce the blood pressure of middle-aged men. Preliminary work suggests that calcium may be effective and that the effect may be greater for African-American men than for white men. • Describe a completely randomized design given you have 100 men, 50 White and 50 African- American.
Design example Treatment 1 Calcium 50 men Group 1 • What potential problems might be have because we started with random assignment? • How should we alter our experiment? Compare blood pressure Random Assignment Treatment 2 Placebo 50 men Group 2
Block Design 50 African American men Completely randomized experiment All participants Completely randomized experiment 50 White men
25 men Group 1 25 men Group 2 25 men Group 3 25 men Group 4 Treatment 1 Calcium Block Design 50 African American men Random assignment Treatment 2 Placebo Compare blood pressure Subjects Treatment 1 Calcium Random assignment 50 White men Treatment 2 Placebo All African American men will be assigned a random number. Half the men who have the smallest numbers will be assigned group 1, the half with the largest numbers will be assigned group 2. The process will repeat for the white men. The reduction in blood pressure will be compared.
Improving the Design • A block is a group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. • Block design has the same rationale as a stratified random sample. • Blocks allow us to reduce the amount of variation to improve the accuracy of our conclusions by creating homogeneous groups. • single blind versus double blind
Design Example 6 • Is the right hand of right-handed people generally stronger than the left? Paul Murky of Murky Research designs an experiment to test this question. He fastens an ordinary bathroom scale to a shelf five feet from the floor, with the end of the scale projecting out from the shelf. Subjects squeeze the scale between their thumb and their fingers on the top. The scale reading in pounds measures hand strength. • Is a completely randomized experiment appropriate?
Matched pair Design Treatment 2 right hand Group 1 Treatment 1 left hand Compare difference Random Allocation Treatment 2 right hand Treatment 1 left hand Group 2 All participants will be assigned a random number. Half the subjects who have the smallest numbers will be assigned group 1, the half with the largest numbers will be assigned group 2. A coin will be flipped to decide which group gets treatment 1 first. Heads will squeeze the left hand first, tails will squeeze the right hand first. The difference in the pounds on the scale will be compared.
Improving the Design • In a matched pair design, each subject in the experiment will receive two (and only two) treatments. • The order that each subject receives both treatments is randomly selected to preserve the important aspect of randomization.
Why a simulation? • A simulation is using a model to imitate a chance behavior based on a specific problem situation. • A simulation allows a model to be analyzed when a theoretical probability is unknown or indeterminate.
Elements of a simulation • Number assignment • Description of a trial • Stopping rule • Execution of simulation (marking of the number line) • Documentation of results
Simulation Example • Traffic Lights: Coming to school each day, Anne rides through three traffic lights, A, B, and C. The probability that any one light is green is 0.3, and the probability that it is not green is 0.7. Use a simulation to answer questions below. • We must assume that the lights operate independently. • Estimate the probability that Anne will find all traffic lights to be green. • Estimate the probability that Anne will find at least one light to be not green.
Simulation Example • Number assignment • 0 – 2 green light; 3 – 9 not green • (1 – 3 green light; 4 – 0 not green) • Description of a trial/Stopping rule • A trial consists of choosing one digit at a time to represent one traffic light. After we determine if the light is green or not green, the trial ends after three lights. • Execution of simulation • Documentation of results