1 / 69

Lecture 2 Sampling Techniques

Lecture 2 Sampling Techniques. For use in fall semester 2015 Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester. Last modified: 4-8-2015. Lecture Aim & Objectives. Aim

chrisjones
Download Presentation

Lecture 2 Sampling Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 2Sampling Techniques For use in fall semester 2015 Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester. Last modified: 4-8-2015

  2. Lecture Aim & Objectives Aim • To investigate issues relating to sampling techniques for survey research Objectives • What is a sample? • How should the sample be obtained? • Sampling considerations • Sampling techniques • Sources of error & degrees of confidence • How large should the sample be?

  3. What is Sampling? • Method for selecting people or things from which you plan to obtain data • Closely associated with quantitative methods • i.e. surveys or experiments • Sometimes associated with qualitative methods • i.e. content analysis & ethnography • Used because it’s rarely feasible or effective to include every person or item in a survey or study

  4. Not Feasible or Effective….. • Travel patterns of UK adults • Need to survey 50mn+ people! • The UK government conducts a Census of Population every 10 years but this costs tens of £mn’s • Even a survey of annual cruise passengers visiting Molde would be costly & time consuming • Sampling provides a feasible & effective solution

  5. What is a Sample? “A sample is a portion or sub-set of a larger group called a population” (Fink, 2003; p33) Note: sampling isn’t necessary when you survey the entire population! + + + + + + + + + + + + + + + + + + + + + + + + + +

  6. What is a Population? • It can consist of human & non-human phenomena • Organisations, businesses, geographical areas, households, individuals • Examples: • Hotels in Møre og Romsdal (population of hotels) • Beaches in Australia (population of beaches) • People in Norway (population of Norway) • Households in Molde (population of households) • Visitors to a resort (population of visitors) • Users of a ferry service (population of users) • Students at HiMolde (population of students)

  7. Aims of Sampling • Provide a small & more manageable portion or sub-set of the population • Represent the population & be free from bias • Results for the sample should be similar if the survey was conducted on another sample from the same population • i.e. results are repeatable & reliable

  8. The Need for Reliable Representation

  9. Extracting a Sample Two main sources • From a sampling frame • A list of all known cases in a population from which a sample can be drawn • Sampled at source • Points in time/space where a potential population is available

  10. Electoral register – individuals over 18 Telephone directories – households Royal Mail – households Market research companies – households / postcodes / census areas Businesses – customers Organisations / clubs / trade associations – members Magazines / newsletters – subscribers Local authorities / CCI – households / employers Business / trade directories – businesses Yellow pages – clubs / organisations / businesses Tourism offices – reservations / visitors’ Hotels/accommodation – registration records / reservations Typical Sampling Frames

  11. Sampling Frames • Only available where there is a finite population • i.e. where the population can be clearly defined • Potential problems • List not up-to-date / only up-dated periodically • Lags in registration & deregistration • Clusters of individuals create complexities • e.g. making sure you survey the correct individual in a sampling frame of households • Some cost money to access or are confidential

  12. Sampling at Source • Clearly defined population is not the case when sampling at source • i.e. shopping streets, visitor attractions, transport terminals, museums, sporting events, etc • Problems • The population is fairly vague (‘hanging around’) • Individuals present are not listed in any form which would constitute a sampling frame • Sampling is more challenging

  13. Sampling Considerations Two key Q’s to address in any sample survey • How should the sample be obtained? • Who or what should be sampled (eligibility criteria)? • Who do you survey (profiles & individuals in clusters)? • When should sampling take place (timing & timescale)? • Where should the survey be administered (location)? • What sampling technique do you use (probability versus non-probability)? • How large should the sample be?

  14. How Should the Sample be Obtained? • Who or what should be sampled? • Therefore defining the eligibility criteria • Who do you survey? • Households, visitor attractions, shopping streets, etc will normally have people in clusters as opposed to individuals • Ensure that the survey is completed by the correct individual

  15. How Should the Sample be Obtained? • When should the sampling take place? • Time of year, month, day, time • Duration of the sampling process • Useful to • Have some prior knowledge of the phenomena to be sampled as results may be biased by particular times of day or year or weekly, monthly & seasonal variations • Spread the sampling over different times, days, months, etc to reduce potential for bias

  16. How Should the Sample be Obtained? • Where should the survey be administered? • This could be determined by the definition of the population • e.g. surveys sent to postal addresses • On-site surveys should consider location of interviewers • e.g. recreation areas or tourist attractions tend to have natural or pre-defined entry & exit points • If using multiple-interviewers, strict instruction must be given on where to stand

  17. How Should the Sample be Obtained? • What sampling technique should be used? Two main options Probability Techniques 1. Simple random sampling 2. Systematic random sampling 3. Stratified random sampling 4. Cluster sampling 5. Multi-stage sampling Non-Probability Techniques 1. Haphazard sampling 2. Purposive sampling a. Judgement sampling b. Quota sampling c. Snowball sampling d. Expert choice sampling

  18. Sampling Techniques • Choice of technique is dependent on 2 Q’s • Is the population known/clearly defined? • Can the population be listed as a sampling frame? No or uncertainty Sampling is complex & based on Non-Probability Techniques (used when sampling at source) Yes to either Q Allows for Probability Techniques (used with sampling frames)

  19. Probability Sampling Techniques • Simple random sampling • Each unit has an equal chance of selection • e.g. lottery draw, names pulled from a list • Probability of selection is: • (sample size/total population)*100 • e.g. (100/1,000)*100 = 10% (a 1 in 10 chance) • Should really use a table of random numbers • e.g. see http://stattrek.com/Tables/Random.aspx

  20. Table of Random NumbersCreate a sample of 10 from a population of Norway’s top 30 football clubs

  21. Your turn….. Create a sample of 10 from a population of England’s top 30 football clubs

  22. Simple Random Sampling • Quick, cheap n’ easy… • Each unit has an equal chance of selection… • Need to list units of the poulation • Difficult to do with a large sampling frame…

  23. Probability Sampling Techniques • Systematic random sampling • Pull one unit from a list at regular intervals • e.g. every nth name from a membership list • Commonly used by production companies to survey product quality

  24. Procedure for Systematic Random Sampling

  25. Example (using a small sampling frame) of 30 students • Sample 10 from a population of 30 • 30/10=3, select a number between 1 & 3 to start from (e.g. 2), then select every 3rd number

  26. Your turn…..Sample 6 from the list of 30, starting at 3

  27. Probability Sampling Techniques • Stratified random sampling • Simple/systematic could miss particular groups when using a small population • e.g. mature students • Prior knowledge may suggest that inclusion of a group(s) is necessary • e.g. mature students perform better than others • Stratified random sampling samples according to groups (strata)

  28. Procedure for Stratified Random Sampling

  29. ExampleSurvey a Sample of 400 Households in a County 100 100 25% 40% 25% 100 10% 100 Randomlyselect an equalamount from eachofthe 4 districts in thecounty (e.g. 100 from each for a sampleof 400)

  30. Problem Associated with Multiple Variables • The sample is representative of a single variable but not of others • e.g. representative of the 4 districts in the county but not necessarily of age of residents • Where multiple variables are required, the benefits of stratified random sampling diminish in favour of simple/systematic random sampling • This problem is less likely when creating a large sample

  31. Problem Associated with Time & Cost • Stratified divides into groups, then selects units using random sampling • Random sampling may produce a sample that is geographically dispersed • Especially problematic for face-to-face surveys • e.g. the 100 units selected for the household survey in districts 1-4 may come from different parts of each district and interviewers may need to travel vast distances between each unit to conduct their surveys • Clustering can overcome this problem

  32. Probability Sampling Techniques • Cluster sampling • Draw from mutually exclusive sub-groups • e.g. the 100 units selected for the household survey in districts 1-4 will be selected in clusters instead of randomly

  33. Example: Stratified versus Cluster 25% 25% 40% 40% 25% 25% 10% 10% Stratified takes an equal amount from each (e.g. 100 from each for a sample of 400) Cluster takes a proportionate amount from each & in clusters (e.g. 16 clusters of 10 from district 1, 4 clusters of 10 from district 2, 10 clusters of 10 from districts 3 & 4, for a sample of 400)

  34. The Problem with Cluster Sampling • Whilst cluster sampling provides huge time & cost savings, it is likely to have a much greater potential for sampling error • i.e. certain parts of each district will be excluded

  35. Probability Sampling Techniques • Multi-stage sampling • Experts increasingly use a combination of probability sampling techniques • e.g. sample attitudes to tourists in Norway’s towns • Draw up a sampling frame of towns in Norway • Randomly (simple, systematic or stratified) select an appropriate number of towns • Randomly select an appropriate number of electoral wards (geographical units from which politicians are elected) from each town • Randomly select an appropriate number of voters from the electoral register of each ward

  36. Non-Probability Sampling Techniques • Haphazard sampling (accidental, convenience or availability) • Samples drawn at the convenience of the interviewer • e.g. people on a street that are available & willing to participate • This technique should still be systematic • e.g. stop 1 in every 10 passers-by • Don’t just stop those that you fancy.............!

  37. Non-Probability Sampling Techniques • Purposive sampling • Judgement: samples are believed to possess the necessary attributes • e.g. mature students for a survey on mature students • Quota: selection according to a pre-specified sampling frame • e.g. select 75 out of 100 units aged 21-25 with the presumption that 75% mature students will be 21-25 and 25% will be 26+ • The problem is that you need to decide which specific characteristics to quota (age, gender, income?)

  38. Non-Probability Sampling Techniques • Snowball: one sampling unit refers another, who refers another, etc • e.g. expats refer other expats for a survey on expats • Not particularly representative but useful when the population is hard to find or access (e.g. the homeless) • Expert choice: asks experts to choose typical units • i.e. representative individuals or cities • Often referred to as a ‘panel of experts’ • This helps elicit views of persons with specific expertise • Also means they help to validate & ‘defend’ any results

  39. Probability versus Non-probability Sampling Techniques • In probability sampling • Representation is determined by the fact that every unit has an equal chance of being selected, based on probability theory • In non-probability sampling • There is an assumption that there is an even distribution of characteristics within the population • BUT, the population may or may not be represented and it will be hard to know which is true

  40. Why Might the Following Approaches to Sampling be Biased? • I want to survey golf club members attitudes to the quality of the greens and survey a sample of the top 25 players at the club • I want to survey people in Molde to find out what they think about my cafe so I survey every 10th customer in the cafe. Surveys are conducted every Monday morning • I survey 2,500 bus passengers in Ålesund, over a series of times, days and months, to ask what they think about the availability of bus services in Ålesund

  41. Sources of Error • Non-sampling errors (i.e. from survey design or delivery) • Non-observation errors: failing to obtain data from certain segments of the population due to non-response or exclusion • Observation errors: inaccurate information obtained from the samples or errors in data processing, analysis or reporting

  42. Sources of Error • Sampling error (i.e. from sampling) • Where the sample drawn may not provide the same estimates of certain characteristics as other same-size samples from the population

  43. Example of Sampling Error • Age of Squash club members (n=40): 24, 21, 23, 16, 17, 56, 60, 64, 58, 57, 60, 47, 42, 41, 40, 22, 35, 38, 40, 41, 49, 19, 19, 20, 35, 27, 28, 29, 30, 71, 66, 21, 23, 26, 27, 30, 31, 45, 55 • Overall average is 37.5 years (population parameter) • Average for 5 separate samples of 10 members • 35.7, 39.5, 23.1, 51.3, 30.3 (estimates) • Accuracy (AKA standard error) of sample means can be calculated for probability samples

  44. Standard Error • Accuracy is often quoted in studies • The 2% error is called the standard error • Measures statistical accuracy of the sample • Standard error decreases as sample size increases • Zero error when the sample is the population • “56% of customers were more than satisfied with service quality; this estimate is subject to a 2% error either way”

  45. Calculating the Standard Error • Standard error = sdev / (√n) • sdev: standard deviation of sample mean • n: sample size Example • Random sample of 50 customers have a mean age of 23.4 and a standard deviation of 9.7 • Standard error = 9.7 / (√ 50) = 1.4 • Therefore, population mean is likely to be 23.4 +/-1.4 (i.e. range between 22.0-24.8 years)

  46. Degrees of Confidence • Standard error doesn’t say how likely it is (i.e. how confident we can be) that the estimated range is correct • We use principles of standard deviation to determine the level of confidence in our estimated range

  47. 68% 95% 99% -3sd -2sd -1sd Mean +1sd +2sd +3sd Standard Deviation 95% of responses fall within 2 sdev’s of the mean

  48. Degrees of Confidence • 2 sdev’s means we can be 95% confident (i.e. correct 95 times out of 100) that the sample mean will lie within 2 sdev’s of the population mean • Calculating 95% confidence for the earlier example • Where we said that the population mean is likely to be 23.4 +/-1.4 (i.e. range between 22.0-24.8 years) • 23.4 +/- 2.8 (standard error of 1.4 x 2) provides a range of 20.6 to 26.2 • Therefore, we can be 95% confident that the population mean is between 20.6 and 26.2 years • Do the same for the 99% level of confidence…..

  49. Acceptable Level of Confidence? • 68% of all sample means would fall within a range of +/- 1 sdev of the population • This means that we would be 68% confident that the population mean is between 22.0 & 24.8 years • The 68% level of confidence means there is a 32% chance of being incorrect • 95% is normally used as the acceptable level of confidence for statistical analysis

  50. How Large Should the Sample be? • Sample size is NOT relative to population size! • Sample size is absolute • e.g. provided sampling procedures have been followed, a sample size of 1,000 is equally valid for a population of British adults (50mn), London residents (7mn) or Molde residents (24,000) • Sample size is determined by • The availability of resources • The purpose of data you intend to collect • The required level of accuracy in the results • The required level of confidence

More Related