220 likes | 241 Views
Sample Design. AP Statistics. Quick definitions. Response Variable / Dependent Variable (the output) Explanatory Variables / Independent Variables (input)
E N D
Sample Design AP Statistics
Quick definitions • Response Variable / Dependent Variable (the output) • Explanatory Variables /Independent Variables (input) Example: We want to test the fuel economy of a car. The response variable would be kilometers/liter. The explanatory variable could be engine capacity, cylinders, etc.
Quick definitions • Experiment – a study where one or more explanatory variables are manipulated to observe the effect on a response variable • Experimental Condition (aka. Treatments)-a particular combination of explanatory variables.
Bias is introduced by the way the sample is selected OR the way data is collected • Bigger samples do not necessarily mean less bias, or even reduced bias.
Types of Sampling Bias Selection Bias Measurement Bias (also called Response Bias) The method of observation produces values that are different from the truth in some way Forgot to calibrate a scale Worded a question in a way that influenced responses (ex. “Disposable diapers make up less than 2% of landfill trash, while beverage containers, 3rd-class mail and yard waste are about 21% of all landfill trash. Given that, would it be fair to tax or ban disposable diapers?”) • The way the sample is selected excludes part of the population • For example, phoning people for a survey excludes the homeless or those without telephones, so it could not be generalized to talk about EVERYONE • Any study that calls for volunteers has selection bias
Types of Sampling Bias Selection Bias Response Bias Problems in measurement Leading Questions EX. Hillary Clinton voted to go to war in Iraq. Could you see yourself voting for Clinton in the next election? Social Desirability People are reluctant to admit illegal activities or attitudes that are not acceptable in society if the results are not confidential. • When the survey doesn’t represent the population • Undercoverage • Literary Digest polled their readers on who would win US election: FDR vs. Landon. Respondents picked Landon, but non-respondents tended to be lower income workers, who didn’t have telephones or a public address that comes with a car registration. • Nonresponse Bias • When large numbers of surveyed people don’t respond. Mail surveys often suffer from this. • Voluntary Response Bias • Volunteers on a survey tend to overrepresent people with strong opinions.
Study Description Reasonable to generalize to the entire population Unreasonable to generalize to the entire population Yes No No Yes No No No Yes Yes No • Observational study with sample selected at random from a population of interest • Observational study based on convenience or volunteers • Experiment in which individuals are not randomly selected • Experiment where people are randomly selected • Expeiment where groups are not formed by random assignment due to experimental conditions
How can we eliminate bias? • Easiest way: Randomize the sample selection • IMPORTANT: the proportion of the population DOES NOT MATTER when talking about generalizing a study, only the sample size. Get big enough to talk about shape center spread, and if the sample and procedure isn’t biased, you’re OK to generalize.
How can we eliminate bias? • Direct Control Making extraneous variables the same across conditions so they don’t affect the experiment. Example: A waitress decides to test whether or not writing “thank you” on a check increases her pay. An extraneous variable would be where the patron is sitting, so she could control this by only testing window seats.
How can we eliminate bias? • Replication Making sure that there is a large number of observations for each experimental condition. Example: Our waitress can’t have a successful experiment if she only tries one person with each treatment (thank you or no thank you). She needs to replicate both treatments again until she has a larger sample size.
How can we eliminate bias? • Blocking: Create groups based on known outside variables, the test each group. • For example: An experiment is designed to test a new drug on patients. There are two levels of the treatment: drug, and placebo, administered to male and female patients in a double blind trial. The sex of the patient is a blocking factor accounting for treatment variability between males and females.
Simple Random Samples • Simple random sample: • Everyone has an equal chance of being selected Procedure: create a sampling frame (numbered list of individuals in the population). Use a random number generator to select the sample.
Stratified random sampling • Separate random samples are selected from each subgroup to ensure representation • This can offer valuable info about each sub-group individually for little extra work • Since each sub group is largely homogenous, the sample selected should be representative of the sub-group even if the sample size is smaller. • Each sub-group of a stratified sample are called Strata.
Cluster Sampling • Individuals are divided into subgroups called clusters. • Clusters are selected at random for inclusion into the sample. Clusters not selected are not included.
Stratified Random Sampling vs. Cluster Sampling Stratified Sampling Cluster Sampling The groups are heterogeneous (include people of every type) We don’t sample from every group (“cluster”) • The groups are homogenous (individuals in each group are largely the same) • We sample from EVERY group (“strata”)
Examples Stratified Sampling Clustered Sampling We want to survey students about their acess to college advisors All students in the school (our entire population) has a homeroom We select 3 homerooms at random to represent the entire sample • We want to know the average cost of malpractice insurance for doctors. • We split up doctors into 4 categories: • Surgeons • Interns and family practitioners • Obstetricians • Others • We select random samples from each category to study
Systematic Sampling • If everyone is already in a sequential order, then we can pull every nth person to use in the sample • Example: pull students and faculty from a phone book. Every 25th name will participate in the study. • A Systematic sampling is called a “1 in k systematic sample.” • For the example above, this is a 1 in 25 systematic sample.
Convenience Sampling • When a sample is pulled based on what is readily available, such as volunteers • DON’T DO THIS. Convenience sampling is a mark of a bad study, as it has selection bias (more specifically, it usually has volunteer bias. Volunteers are MUCH more likely to be highly opinionated for a survey).
A quick note on replacement • Sampling with replacement means that once selected, participants can be selected again. • Sampling without replacement means that once selected, participants can not be selected again. • Both are just as good if the sample size is less than 10% of the population, though sampling without replacement is more common practice.
What’s wrong with the survey below? • Psychologists working for the Health and Medicine Journal surveyed college students about their use of legal and illegal drugs. • The sample consisted of students enrolled in a psychology class at a small, competitive college in the USA.
Is this a simple random sample, stratified sample, systematic sample, or convenience sample? • Psychologists working for the Health and Medicine Journal surveyed college students about their use of legal and illegal drugs. • The sample consisted of students enrolled in a psychology class at a small, competitive college in the USA.
AP Challenge • Of 6500 students enrolled in a community college, 3000 are part time and the other 3500 are full time. The college can provide a list of students that is sorted so that all full-time students are listed first, followed by all part time students. • Describe a procedure for selecting a stratified random sample that uses full-time and part-time students as the two strata and includes 10 students from each stratum. • Does every student at this college have the same chance of being selected for inclusion in the sample? Explain.