190 likes | 297 Views
Chapter One, Part 2. 1.4: Critical Thinking in Statistics 1.5: Collecting Sample Data. Quick Review. We typically have a VERY LARGE set of individuals (called the Population ), but we cannot obtain data from every individual.
E N D
Chapter One, Part 2 1.4: Critical Thinking in Statistics 1.5: Collecting Sample Data
Quick Review • We typically have a VERY LARGE set of individuals (called the Population), but we cannot obtain data from every individual. • We choose a subset of the Population (this subset is called a Sample), and gather data from those individuals. • Use data from the Sample is to draw conclusions about the Population. For this to work, the data must be gathered in an appropriate way. • There are many “bad” ways to sample!
Sampling Bias • Ideally, we want our sample to be representative of the overall population. • If the way we choose the sample and/or gather data from the chosen individuals… • …is more likely to include a certain type of individual or produce a certain type of response, or… • …is less likely to include a certain type of individual or produce a certain type of response… • …then our analysis/inference might not give accurate results about the intended population. • This is called bias. Examples to follow.
Examples of Bias Estimate average class height using a sample of students from the front row. Estimate average class height using a sample of male students. Study the effectiveness of a weight-loss diet using a sample of professional athletes. Estimate what percent of Americans approve of the president using a sample of voters from only one political party. 08/15/11
Common Types of Sampling Bias • A voluntary response sample occurs when the individuals to be studied have control over whether or not they are included in the sample. • This is also called “self-selection bias.” • A convenience sample occurs when the researcher is more likely to choose individuals who are easier to obtain data from. • The researcher might not be aware of this! • Small sample: Using too few individuals increases of chance of getting a sample that consists only of “unusual” individuals.
Example: Voluntary Response • Ratemyprofessors.com is a website that collects information about college professors from their students. • The ratings come from students volunteering to create an account and submit information. • Question: What kind of students are likely to volunteer?
Voluntary Response Bias • Answer: Students with stronger opinions are more likely to volunteer a response. • Fact: In many “customer satisfaction” those with a strong negative opinion are most likely to volunteer. Those with a neutral opinion are least likely to volunteer. • There is potential bias: those in the sample are more likely to have a negative opinion than the entire population.
Example: Convenience Sample I want to determine the average SAT Math score of all current Clayton State students. For my sample data, I choose ten students from the current class and compute the average SAT Math score for those ten. Why might this lead to inaccurate results? Although my intended population is all CSU students, I picked only from a small part of the population (that was most convenient for me). 08/15/11
Other Common Problems • Some types of bias occur not in choosing the sample, but in gathering data from the chosen individual. • Misreported data: Individuals may give inaccurate results (perhaps unintentionally) when asked a certain question. • Example: How much do you weigh today? • Example: How many hours per week do you study? • Question wording: Variations in the wording of a question can greatly influence people’s responses. Compare: • Should the government spend more money on public education? • Should the government spend more of your tax dollars on public education?
Good Ways to Sample • We almost always want some degree of randomness when choosing our sample. • What does “randomness” actually mean? We’ll answer this in Chapters 4 and 5. • Randomly-chosen samples reduce the potential for biased results. However, complete randomness is often impossible to achieve in practice.
** Simple Random Sample ** • All of the statistical inference in this course will assume that data comes from a Simple Random Sample (SRS). This means… • Before choosing individuals, we decide how many we want to use. This is called the sample size, usually denoted by the letter n. • We must choose the sample in such a way that each group of n individuals (from the overall population) is equally likely to be picked.
** Simple Random Sample ** • If you’ve played any sort of card game, you have seen an SRS. • Be sure the deck of cards is well-shuffled. • Deal a hand (of a given size, often 5 cards). This is an SRS. • Question: What is the population? • Other methods for obtaining an SRS: • Draw names from a hat. • Use a computer to pick “random” numbers.
Other “Good” Samples • Random Sample: Each individual from the population has an equal chance of being chosen for the sample (every SRS is a Random Sample, but not every Random Sample is an SRS). • Example (Random Sample): Each student flips the same coin. If your coin is… • …heads, you are included in the sample. • …tails, you are not include in the sample. • Each studenthas the same chance of being included. Why does this NOT produce an SRS?
Other “Good” Samples • Stratified Sample: Divide the population into mutually exclusive groups (strata), and choose a sample (often an SRS) from within each group. • This is often done if we want to account for some kind of population demographics. • Example: If our population is 60% women and 40% men, we might choose a sample of 30 women and 20 men. Sample demographics match those of the population.
Other “Good” Samples • Cluster Sampling: Divide the population into mutually exclusive groups (clusters), and randomly choose a set of clusters. For each selected cluster, gather data from all individuals within that cluster. • Example: There are 17 sections of Math 1231 currently offered at Clayton State. Randomly choose 3 sections, and survey all students from each of those 3 sections.
How are the data obtained? • In addition to choosing a “good” sample, it is important to distinguish between the following two general scenarios: • Observational Study: Simply observe and/or measure individuals, without attempting to modify their characteristics or behavior. • Experiment: Deliberately impose a specific set of conditions (a treatment) on each individual. A valid experiment has more than one possible treatment, and we can compare results.
Example: Experiment vs. Observational Study Question: Is there any sort of relationship between caffeine use and exam scores? Here is an Observational Study: Just before the exam, record how much caffeine each student has consumed today. Record each student’s exam score. It will probably be the case that different students consume different amounts of caffeine, but we do not deliberately try to create this difference. 08/15/11
Example: Experiment vs. Observational Study Question: Is there any sort of relationship between caffeine use and exam scores? Here is an Experiment: Require students to consume no caffeine on exam day. 15 minutes before the exam, give each student a cup of coffee. Some students will get regular coffee (with caffeine), others will get decaffeinated coffee. Record each student’s exam score. It will certainly be the case that different students consume different amounts of caffeine, because we deliberately created such a difference. 08/15/11
Questions for Discussion • In the Experiment, why not give all students a cup of (caffeinated) coffee? • In the Experiment, why not use “regular coffee” versus “no coffee”? • What are some advantages/disadvantages of the Study versus the Experiment? • Can you think of a scenario where it would not be possible to do an Experiment?