1 / 0

Chapter 12: Sample Surveys There is no recovery from poorly collected data!

Chapter 12: Sample Surveys There is no recovery from poorly collected data!. "An approximate answer to the right question is worth a good deal more than the exact answer to an approximate question." John Tukey. Overview.

mandell
Download Presentation

Chapter 12: Sample Surveys There is no recovery from poorly collected data!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 12: Sample SurveysThere is no recovery from poorly collected data!

    "An approximate answer to the right question is worth a good deal more than the exact answer to an approximate question." John Tukey
  2. Overview With the ultimate goal of uncovering truths about a population, we discuss how to collect useful data from representative samples. We consider polls, surveys, and other means of gathering data, and introduce terminology and notation about populations and parameters. We examine the importance of random selection and sample size. We discuss several sampling designs, and many kinds of bias that can render our results meaningless.
  3. Designing Samples Statistical inference: Provides ways to provide "reasonable" responses to specific questions by examining data. Population: Group from which information is desired. Sample: Part of a population that is examined in an attempt to obtain information about the population.
  4. Population vs Sample Population: Group from which information is desired. Census: A sample that consists of the entire population. Can use when population is small. Parameter: A number used in a model for a population Sample: Part of a population that is examined in an attempt to obtain information about the population. Sample Survey: A study that asks questions of a sample of the population. Example: Poll taken to assess voter preferences Statistic: a summary computed from the data
  5. Population vs Sample *We want the statistics we compute to reflect the corresponding parameters accurately. A sample that does this is said to be representativeof the population.
  6. Size Matters The number(n) in the sample matters; The size of the population does not.
  7. Sample Sampling Frame: Individuals from whom the sample is drawn; they should be in the population of interest Sampling Variability: We will not get the same sample each time? Sometimes called sampling error—it really isn’t an error—it’s the natural tendency that randomly drawn samples differ (vary) from one another. Sample Size: the # individuals in the sample; the fraction of the population that you’ve sampled doesn't matter, it’s the sample size that determines how well the sample represents the population
  8. Bias – systematic deviation from the truth, when our sample does not represent the population. Introduced when sampling methods, by their nature, tend to over or under emphasize some characteristics of the population. It is almost impossible to recover from bias, so efforts to avoid it are well spent. Sources/Types of Bias Voluntary Response Bias Convenience Sampling Bias Undercoverage Nonresponse Bias Response Bias
  9. Sources/Types of Bias Voluntary Response Bias: bias introduced when individuals choosewhether or not to participate in the sample Bias introduced when individuals choose whether or not to participate in the sample samples based on voluntary response are always invalid and cannot be recovered, no matter how large the sample size   Convenience Bias:bias introduced when individuals in the sample are conveniently available. These samples fail to represent the population, because every individual in the population is not equally convenient to the sample.
  10. Source/Types of Bias Undercoverage: some parts of the population are left out and therefore under-represented Non-response: when a large portion of those sampled fail to respond; voluntary response bias is a form of non-response bias Response bias: when something in the survey influences the response of those being sampled; wording of the question or interviewer’s behavior can result in response bias. Poorly worded questions can confuse those responding to it.
  11. Randomizing Randomizing: protects us from the influences of all the features of our population by making sure that on average the sample looks like the rest of the population.
  12. Good Sampling Methods Helps avoid bias andensure that the samples are representative of the population so results can be generalized to the population. Simple Random Sample (SRS) Stratified Sample Cluster Sample Systematic Sample Multistage Sample
  13. Good Sampling Methods Simple Random Sample(SRS) of size n: each set of n elements in the population has an equal chance of selection n. This is the standard against which we measure other sampling methods. Careful...the notion of an SRS can be tricky. Example: A class consists of 4 boys and 4 girls. The teacher wants a sample of two students. She decides to flip a coin. If the coin comes up heads, she will choose two boys by a random process. If tails, she will choose two girls by a random process. Question 1: Does each student have an equal chance of being in the sample? ...Answer: Yes Question 2: Is this an SRS of two students from the class? ...Answer: No, it is only representative of one gender, not the whole class
  14. Good Sampling Methods Stratified Random Sampling: when we first divide population into strata (homogeneous groups) then choose an SRS within each stratum, and combine the results *can reduce sampling variability
  15. Good Sampling Methods Cluster Sampling: when we split the population in similar parts or clusters (heterogeneous groups) and then select one or a few clusters at random and perform a census within each one of them. *Usually selected as a matter of practicality, convenience, or cost *makes sampling tasks more manageable
  16. Good Sampling Methods Systematic randomsampling: when we start from a random spot on a list of subjects whose order is in no way associated with the response sought, and select every ith individual to become part of our sample. *It can be much less expensive than true random sampling.
  17. Good Sampling Methods Multistage sampling: sampling that combines several sampling methods; Example: a national polling service may stratify the country by geographical regions, select a random sample of cities from each region, then interview a cluster of residents in each city
  18. Example Mr. Salem would like to select 8 students to win FREE tickets to a “lock in” at HTHS that will include a 4 hour story telling session given by himself.  How could you get an SRS of the students at your school? How about stratified or cluster samples? Or a systematic sample? SRS: stratified sample: cluster sample: systematic sample:
  19. Sampling Methods to Avoid Avoid the following due to bias: Voluntary Response Sample Convenience Sample
  20. Size Matters Most people find it counterintuitive that the accuracy of survey or poll results is determined by the size of the sample regardless of the population size. By sampling 1000 voters we can estimate the outcome of an election with the same margin of error, whether it is a mayoral election in a city, a race for governor of a state, or the choice of a new president. (This is true as long as the population is much larger than the sample—if there are only 1200 voters in the city, then our sample will provide much more accurate results, of course.) How can this be?
  21. Example Consider sampling a new flavor at Yogurt Mountain: You taste one spoonful as your sample, and if you like it you buy a cup of that yogurt. You are basing your decision on the assumption that your random spoonful is representative of the entire batch of yogurt. The size of the sample and the randomness of the sample is what is important.
More Related