950 likes | 1.42k Views
Lecture 2 Sampling Techniques. For use in fall semester 2015 Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester. Last modified: 4-8-2015. Lecture Aim & Objectives. Aim
E N D
Lecture 2Sampling Techniques For use in fall semester 2015 Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester. Last modified: 4-8-2015
Lecture Aim & Objectives Aim • To investigate issues relating to sampling techniques for survey research Objectives • What is a sample? • How should the sample be obtained? • Sampling considerations • Sampling techniques • Sources of error & degrees of confidence • How large should the sample be?
What is Sampling? • Method for selecting people or things from which you plan to obtain data • Closely associated with quantitative methods • i.e. surveys or experiments • Sometimes associated with qualitative methods • i.e. content analysis & ethnography • Used because it’s rarely feasible or effective to include every person or item in a survey or study
Not Feasible or Effective….. • Travel patterns of UK adults • Need to survey 50mn+ people! • The UK government conducts a Census of Population every 10 years but this costs tens of £mn’s • Even a survey of annual cruise passengers visiting Molde would be costly & time consuming • Sampling provides a feasible & effective solution
What is a Sample? “A sample is a portion or sub-set of a larger group called a population” (Fink, 2003; p33) Note: sampling isn’t necessary when you survey the entire population! + + + + + + + + + + + + + + + + + + + + + + + + + +
What is a Population? • It can consist of human & non-human phenomena • Organisations, businesses, geographical areas, households, individuals • Examples: • Hotels in Møre og Romsdal (population of hotels) • Beaches in Australia (population of beaches) • People in Norway (population of Norway) • Households in Molde (population of households) • Visitors to a resort (population of visitors) • Users of a ferry service (population of users) • Students at HiMolde (population of students)
Aims of Sampling • Provide a small & more manageable portion or sub-set of the population • Represent the population & be free from bias • Results for the sample should be similar if the survey was conducted on another sample from the same population • i.e. results are repeatable & reliable
Extracting a Sample Two main sources • From a sampling frame • A list of all known cases in a population from which a sample can be drawn • Sampled at source • Points in time/space where a potential population is available
Electoral register – individuals over 18 Telephone directories – households Royal Mail – households Market research companies – households / postcodes / census areas Businesses – customers Organisations / clubs / trade associations – members Magazines / newsletters – subscribers Local authorities / CCI – households / employers Business / trade directories – businesses Yellow pages – clubs / organisations / businesses Tourism offices – reservations / visitors’ Hotels/accommodation – registration records / reservations Typical Sampling Frames
Sampling Frames • Only available where there is a finite population • i.e. where the population can be clearly defined • Potential problems • List not up-to-date / only up-dated periodically • Lags in registration & deregistration • Clusters of individuals create complexities • e.g. making sure you survey the correct individual in a sampling frame of households • Some cost money to access or are confidential
Sampling at Source • Clearly defined population is not the case when sampling at source • i.e. shopping streets, visitor attractions, transport terminals, museums, sporting events, etc • Problems • The population is fairly vague (‘hanging around’) • Individuals present are not listed in any form which would constitute a sampling frame • Sampling is more challenging
Sampling Considerations Two key Q’s to address in any sample survey • How should the sample be obtained? • Who or what should be sampled (eligibility criteria)? • Who do you survey (profiles & individuals in clusters)? • When should sampling take place (timing & timescale)? • Where should the survey be administered (location)? • What sampling technique do you use (probability versus non-probability)? • How large should the sample be?
How Should the Sample be Obtained? • Who or what should be sampled? • Therefore defining the eligibility criteria • Who do you survey? • Households, visitor attractions, shopping streets, etc will normally have people in clusters as opposed to individuals • Ensure that the survey is completed by the correct individual
How Should the Sample be Obtained? • When should the sampling take place? • Time of year, month, day, time • Duration of the sampling process • Useful to • Have some prior knowledge of the phenomena to be sampled as results may be biased by particular times of day or year or weekly, monthly & seasonal variations • Spread the sampling over different times, days, months, etc to reduce potential for bias
How Should the Sample be Obtained? • Where should the survey be administered? • This could be determined by the definition of the population • e.g. surveys sent to postal addresses • On-site surveys should consider location of interviewers • e.g. recreation areas or tourist attractions tend to have natural or pre-defined entry & exit points • If using multiple-interviewers, strict instruction must be given on where to stand
How Should the Sample be Obtained? • What sampling technique should be used? Two main options Probability Techniques 1. Simple random sampling 2. Systematic random sampling 3. Stratified random sampling 4. Cluster sampling 5. Multi-stage sampling Non-Probability Techniques 1. Haphazard sampling 2. Purposive sampling a. Judgement sampling b. Quota sampling c. Snowball sampling d. Expert choice sampling
Sampling Techniques • Choice of technique is dependent on 2 Q’s • Is the population known/clearly defined? • Can the population be listed as a sampling frame? No or uncertainty Sampling is complex & based on Non-Probability Techniques (used when sampling at source) Yes to either Q Allows for Probability Techniques (used with sampling frames)
Probability Sampling Techniques • Simple random sampling • Each unit has an equal chance of selection • e.g. lottery draw, names pulled from a list • Probability of selection is: • (sample size/total population)*100 • e.g. (100/1,000)*100 = 10% (a 1 in 10 chance) • Should really use a table of random numbers • e.g. see http://stattrek.com/Tables/Random.aspx
Table of Random NumbersCreate a sample of 10 from a population of Norway’s top 30 football clubs
Your turn….. Create a sample of 10 from a population of England’s top 30 football clubs
Simple Random Sampling • Quick, cheap n’ easy… • Each unit has an equal chance of selection… • Need to list units of the poulation • Difficult to do with a large sampling frame…
Probability Sampling Techniques • Systematic random sampling • Pull one unit from a list at regular intervals • e.g. every nth name from a membership list • Commonly used by production companies to survey product quality
Example (using a small sampling frame) of 30 students • Sample 10 from a population of 30 • 30/10=3, select a number between 1 & 3 to start from (e.g. 2), then select every 3rd number
Probability Sampling Techniques • Stratified random sampling • Simple/systematic could miss particular groups when using a small population • e.g. mature students • Prior knowledge may suggest that inclusion of a group(s) is necessary • e.g. mature students perform better than others • Stratified random sampling samples according to groups (strata)
ExampleSurvey a Sample of 400 Households in a County 100 100 25% 40% 25% 100 10% 100 Randomlyselect an equalamount from eachofthe 4 districts in thecounty (e.g. 100 from each for a sampleof 400)
Problem Associated with Multiple Variables • The sample is representative of a single variable but not of others • e.g. representative of the 4 districts in the county but not necessarily of age of residents • Where multiple variables are required, the benefits of stratified random sampling diminish in favour of simple/systematic random sampling • This problem is less likely when creating a large sample
Problem Associated with Time & Cost • Stratified divides into groups, then selects units using random sampling • Random sampling may produce a sample that is geographically dispersed • Especially problematic for face-to-face surveys • e.g. the 100 units selected for the household survey in districts 1-4 may come from different parts of each district and interviewers may need to travel vast distances between each unit to conduct their surveys • Clustering can overcome this problem
Probability Sampling Techniques • Cluster sampling • Draw from mutually exclusive sub-groups • e.g. the 100 units selected for the household survey in districts 1-4 will be selected in clusters instead of randomly
Example: Stratified versus Cluster 25% 25% 40% 40% 25% 25% 10% 10% Stratified takes an equal amount from each (e.g. 100 from each for a sample of 400) Cluster takes a proportionate amount from each & in clusters (e.g. 16 clusters of 10 from district 1, 4 clusters of 10 from district 2, 10 clusters of 10 from districts 3 & 4, for a sample of 400)
The Problem with Cluster Sampling • Whilst cluster sampling provides huge time & cost savings, it is likely to have a much greater potential for sampling error • i.e. certain parts of each district will be excluded
Probability Sampling Techniques • Multi-stage sampling • Experts increasingly use a combination of probability sampling techniques • e.g. sample attitudes to tourists in Norway’s towns • Draw up a sampling frame of towns in Norway • Randomly (simple, systematic or stratified) select an appropriate number of towns • Randomly select an appropriate number of electoral wards (geographical units from which politicians are elected) from each town • Randomly select an appropriate number of voters from the electoral register of each ward
Non-Probability Sampling Techniques • Haphazard sampling (accidental, convenience or availability) • Samples drawn at the convenience of the interviewer • e.g. people on a street that are available & willing to participate • This technique should still be systematic • e.g. stop 1 in every 10 passers-by • Don’t just stop those that you fancy.............!
Non-Probability Sampling Techniques • Purposive sampling • Judgement: samples are believed to possess the necessary attributes • e.g. mature students for a survey on mature students • Quota: selection according to a pre-specified sampling frame • e.g. select 75 out of 100 units aged 21-25 with the presumption that 75% mature students will be 21-25 and 25% will be 26+ • The problem is that you need to decide which specific characteristics to quota (age, gender, income?)
Non-Probability Sampling Techniques • Snowball: one sampling unit refers another, who refers another, etc • e.g. expats refer other expats for a survey on expats • Not particularly representative but useful when the population is hard to find or access (e.g. the homeless) • Expert choice: asks experts to choose typical units • i.e. representative individuals or cities • Often referred to as a ‘panel of experts’ • This helps elicit views of persons with specific expertise • Also means they help to validate & ‘defend’ any results
Probability versus Non-probability Sampling Techniques • In probability sampling • Representation is determined by the fact that every unit has an equal chance of being selected, based on probability theory • In non-probability sampling • There is an assumption that there is an even distribution of characteristics within the population • BUT, the population may or may not be represented and it will be hard to know which is true
Why Might the Following Approaches to Sampling be Biased? • I want to survey golf club members attitudes to the quality of the greens and survey a sample of the top 25 players at the club • I want to survey people in Molde to find out what they think about my cafe so I survey every 10th customer in the cafe. Surveys are conducted every Monday morning • I survey 2,500 bus passengers in Ålesund, over a series of times, days and months, to ask what they think about the availability of bus services in Ålesund
Sources of Error • Non-sampling errors (i.e. from survey design or delivery) • Non-observation errors: failing to obtain data from certain segments of the population due to non-response or exclusion • Observation errors: inaccurate information obtained from the samples or errors in data processing, analysis or reporting
Sources of Error • Sampling error (i.e. from sampling) • Where the sample drawn may not provide the same estimates of certain characteristics as other same-size samples from the population
Example of Sampling Error • Age of Squash club members (n=40): 24, 21, 23, 16, 17, 56, 60, 64, 58, 57, 60, 47, 42, 41, 40, 22, 35, 38, 40, 41, 49, 19, 19, 20, 35, 27, 28, 29, 30, 71, 66, 21, 23, 26, 27, 30, 31, 45, 55 • Overall average is 37.5 years (population parameter) • Average for 5 separate samples of 10 members • 35.7, 39.5, 23.1, 51.3, 30.3 (estimates) • Accuracy (AKA standard error) of sample means can be calculated for probability samples
Standard Error • Accuracy is often quoted in studies • The 2% error is called the standard error • Measures statistical accuracy of the sample • Standard error decreases as sample size increases • Zero error when the sample is the population • “56% of customers were more than satisfied with service quality; this estimate is subject to a 2% error either way”
Calculating the Standard Error • Standard error = sdev / (√n) • sdev: standard deviation of sample mean • n: sample size Example • Random sample of 50 customers have a mean age of 23.4 and a standard deviation of 9.7 • Standard error = 9.7 / (√ 50) = 1.4 • Therefore, population mean is likely to be 23.4 +/-1.4 (i.e. range between 22.0-24.8 years)
Degrees of Confidence • Standard error doesn’t say how likely it is (i.e. how confident we can be) that the estimated range is correct • We use principles of standard deviation to determine the level of confidence in our estimated range
68% 95% 99% -3sd -2sd -1sd Mean +1sd +2sd +3sd Standard Deviation 95% of responses fall within 2 sdev’s of the mean
Degrees of Confidence • 2 sdev’s means we can be 95% confident (i.e. correct 95 times out of 100) that the sample mean will lie within 2 sdev’s of the population mean • Calculating 95% confidence for the earlier example • Where we said that the population mean is likely to be 23.4 +/-1.4 (i.e. range between 22.0-24.8 years) • 23.4 +/- 2.8 (standard error of 1.4 x 2) provides a range of 20.6 to 26.2 • Therefore, we can be 95% confident that the population mean is between 20.6 and 26.2 years • Do the same for the 99% level of confidence…..
Acceptable Level of Confidence? • 68% of all sample means would fall within a range of +/- 1 sdev of the population • This means that we would be 68% confident that the population mean is between 22.0 & 24.8 years • The 68% level of confidence means there is a 32% chance of being incorrect • 95% is normally used as the acceptable level of confidence for statistical analysis
How Large Should the Sample be? • Sample size is NOT relative to population size! • Sample size is absolute • e.g. provided sampling procedures have been followed, a sample size of 1,000 is equally valid for a population of British adults (50mn), London residents (7mn) or Molde residents (24,000) • Sample size is determined by • The availability of resources • The purpose of data you intend to collect • The required level of accuracy in the results • The required level of confidence