330 likes | 513 Views
The 4 Sampling techniques Review and Critical Evaluation. SKILLS NEEDED: Memorise the descriptions of: Simple Random, Stratified, Cluster, Systematic, especially the advantages & disadvantages of using each method. Read notes on the next few slides (memorise key points).
E N D
The 4 Sampling techniquesReview and Critical Evaluation SKILLS NEEDED: Memorise the descriptions of: Simple Random, Stratified, Cluster, Systematic, especially the advantages & disadvantages of using each method. • Read notes on the next few slides (memorise key points). • Do Old Sigma: Ex. 9.2–Q1, Ex. 9.3, Ex. 24.9 .
Sampling Techniques that you need to be able to describe Good sampling techniques (you must be able to describe these & give advantages & disadvantages for using each): • Simple Random Sampling • Systematic Sampling • Stratified Sampling • Cluster Sampling Bad sampling techniques (bias selection processes): • Convenience sampling (e.g. go to the Square & interview the first n people you see). • Self-selected sampling (e.g. people phoning in).
1. Simple Random Sampling • Obtain a list of all N items in the target population, numbering them 1 to N (e.g. the school roll: 1-600). • Decide how many you will select for your sample (n). • Use the random number generator on your calculator to select numbers at random between 1 and N: On calculator, type: 1 + Population size × RAN# 4. Keep pressing ‘equals’ until you have selected ndifferent items (discard any repeats). Advantage of SR sampling: Ensures that every item in the population has an equal chance of being selected – therefore, by definition, no selection bias.
2. Decide how many you will select for your sample (n). 3. Use the random number generator on your calculator to select numbers at random between 1 and N: On calculator, type: 1 + Population size × RAN# 4. Keep pressing ‘equals’ until you have selected ndifferent items (discard any repeats). Advantage of SR sampling: Ensures that every item in the population has an equal chance of being selected – therefore, by definition, no selection bias. Potential disadvantages: • Under/over representation of some sub-groups: Does not ensure that all sub-groups of the population are represented in proportion (e.g. ethnicities, Sth Islanders). • Data collection inconvenient: Only if sampling from a large geographical area. Chosen items are randomly distributed all over the place (no groupings or systems). • Do Old Sigma (2nd edition) pg. 134 – Ex. 9.2: Q1 only.
21.03 Simple random sampling. Generate 20 different random numbers between 1 and 100. If a random number has already occurred, generate more as needed. Calculator formula1 + 100×RAN# 42 67 2 12 77 49 60 20 45 15 64 7 8 21 15 64 58 14 29 68 26 90
Systematic Sampling • Obtain a list of all N items in the target population and number the items from 1N(Sampling Frame). • Decide on your sample size, n . • Pick a random starting point in your list by generating a random number from 1N (e.g. item number 7). • Sample every kth item after that, where k=N/n until you have selected n items. Cycle back through if you reach the end of the list before you have n items. Advantages: • Ensures that sample is selected from throughout the breadth of the sampling frame. • Convenient and fast – easier to collect info on items that are in a sequence (every 5th house) than from a random sample where they are scattered all over.
4. Sample every kth item after that, where k=N/n until you have selected n items. Cycle back through if you reach the end of the list before you have n items. Advantages: • Ensures that sample is selected from throughout the breadth of the sampling frame. • Convenient and fast – easier to collect info on items that are in a sequence (every 5th house) than from a random sample where they are scattered all over. Disadvantage: Be careful that the list itself has no regular pattern that matches your systematic selection (i.e. where every kth item shares similar characteristics in the variable you are measuring).
Advantages: • Ensures that sample is selected from throughout the breadth of the sampling frame. • Convenient and fast – easier to collect info on items that are in a sequence (every 5th house) than from a random sample where they are scattered all over. Disadvantage: Be careful that the list itself has no regular pattern that matches your systematic selection (i.e. where every kth item shares similar characteristics in the variable you are measuring). E.g. If every 2nd house on a street were sampled, all would be on the same side of the street!
21.03 Systematic sampling. To obtain a systematic sample of size 20 from this data. Choose a starting point at random between 1 and 100. Using calculator1 + 100×RAN# = Suppose this gives 5.87352 5. So start at item number 5. Then choose every kth item, where k = N/n. = 100/20 = 5. So sample every 5th item.
Stratified sampling Use when the population consists of categories (strata), and you wish to represent each ‘stratum’ proportionally. E.g. racial groups, one-story and multi-story homes within a city. • Obtain a list of all N items in the target population (Sampling Frame). • Decide on your sample size, n . • Divide sampling frame (list) into the strata (categories). • Select aseparate random sample from each stratum in proportion to the percentage of the population found in each. Proportional Allocation: Selecting from each stratum in proportion to its percentage of the population.
Obtain a list of all N items in the target population (Sampling Frame). • Decide on your sample size, n . • Divide sampling frame (list) into the strata (categories). • Select aseparate random sample from each stratum in proportion to the percentage of the population found in each. Proportional Allocation: Selecting from each stratum in proportion to its percentage of the population. E.g. If 12% of a city’s citizens are Pacific Islanders, then 12% of the sample size should be selected from among the Pacific Island citizens.
3. Divide sampling frame (list) into the strata (categories). 4. Select aseparate random sample from each stratum in proportion to the percentage of the population found in each. Proportional Allocation: Selecting from each stratum in proportion to its percentage of the population. E.g. If 12% of a city’s citizens are Pacific Islanders, then 12% of the sample size should be selected from among the Pacific Island citizens. Advantage: Guaranteed to be representative of each stratum. Disadvantage: Time-consuming and expensive because you must collect information about the strata-sizes in advance.
Stratified sampling. 21.03 Hass: 1–40 40% Fuerte: 41–70 30% Hopkins: 71–100 30% Suppose the avocados are of 3 different varieties. The number in each strata of the sample should be proportional to the number in each group in the population. Hass: 40% x 20 = 8 Fuerte: 30% x 20 = 6 Hopkins: 30% x 20 = 6
Stratified sampling. 21.03 Thus generate random numbers as follows: Hass: 1–40 8 random nos. 33 17 12 25 9 9 33 16 39 8 Fuerte: 41–70 6 random nos. 58 59 67 43 53 56 Hopkins: 71–100 6 random nos. 98 85 96 99 90 81
Cluster sampling Use when the population is distributed into naturally-occurring groups or ‘clusters’
Cluster sampling Use when the population is distributed into naturally-occurring groups or ‘clusters’ (e.g. towns and cities in a country, purchases from a national franchise like Noel Leemings can be grouped by retail outlet). Basic cluster sampling consists of 2 stages of selection: Stage 1: Select the clusters: Select a representative sample of the clusters themselves (usually there are too many to sample from all of them). Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation.
Cluster sampling Basic cluster sampling consists of 2 stages of selection: Stage 1: Select the clusters: Select a representative sample of the clusters themselves (usually there are too many to sample from all of them). Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation. E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each.
Basic cluster sampling consists of 2 stages of selection: Stage 1: Select the clusters: Select a representative sample of the clusters themselves (usually there are too many to sample from all of them). Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation. E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each. *Multi-stage cluster sampling:
Basic cluster sampling consists of 2 stages of selection: Stage 1: Select the clusters: Select a representative sample of the clusters themselves (usually there are too many to sample from all of them). Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation. E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each. *Multi-stage cluster sampling: In reality, most cluster sampling for real-life research uses more than 2 stages.
Stage 1: Select the clusters: Select a representative sample of the clusters themselves (usually there are too many to sample from all of them). Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation. E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each. *Multi-stage cluster sampling: In reality, most cluster sampling for real-life research uses more than 2 stages. E.g. select a sample of cities, then a sample of suburbs from within each chosen city, then a sample of streets from each chosen suburb, then a sample of houses within each chosen street (4-stages)!
Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation. E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each. *Multi-stage cluster sampling: In reality, most cluster sampling for real-life research uses more than 2 stages. E.g. select a sample of cities, then a sample of suburbs from within each chosen city, then a sample of streets from each chosen suburb, then a sample of houses within each chosen street (4-stages)! Advantage of cluster sampling: • Data-collection process convenient, particularly if sampling from a geographically large area (data can be collected in groups within chosen clusters rather than being spread out).
Practice for scholarship questions: 1. Memorise the 4 types of sampling techniques and the advantages & disadvantages of each. (*asked in scholarship exam every year!*) 2. Do Old Sigma (2nd edition): Pg. 137 – Ex. 9.3. 3. Then pg. 399 – Ex. 24.9. E.g. select a sample of cities, then a sample of suburbs from within each chosen city, then a sample of streets from each chosen suburb, then a sample of houses within each chosen street (4-stages)! Advantage of cluster sampling: • Data-collection process convenient, particularly if sampling from a geographically large area (data can be collected in groups within chosen clusters rather than being spread out). Disadvantages of cluster sampling: • Potential bias as items don’t have an equal chance of selection. • Very small clusters are unlikely to be sampled from. • Items that are not in clusters are excluded altogether. E.g. farmers or people in small rural communities may have no chance of being selected for a national survey of households. • Can under-estimate the variation in the population when you sample items within clusters - items that are close together may have similar characteristics. • Requires prior knowledge of cluster sizes.
21.03 Cluster sampling. Here is one way of obtaining a cluster sample of size 20. Choose four clusters, each of 5 avocados, by selecting four numbers at random from the data, and taking them as the middle item of a ‘cross’. If clusters overlap or run outside the boundaries, choose another. Spreadsheet formula99×RAN# + 1 = 62 22 2 68 56 Note: Depending how a cluster is defined, it can exclude some items or make other items more likely to be chosen than under other sampling methods
Comparison of samples. 21.03 Simple random sampling Systematic sampling Cluster sampling Stratified sampling Practice for scholarship questions: 1. Memorise the 4 types of sampling techniques and the advantages & disadvantages of each. (*asked in scholarship exam every year!*) 2. Do Old Sigma (2nd edition): Ex. 9.3. 3. Then Ex. 24.9.
21.04 Here is a brief description of an investigation into house prices. Hypothesis The median price that a house is sold for in Auckland is $350 000, and the distribution of house prices is not symmetric. Sampling method A mixture of cluster sampling and systematic sampling is used to obtain the data. Data processing The data is entered into a spreadsheet, sample statistics calculated and appropriate graphs produced. Conclusions from the data The sample data shows that the median price is only $281 000. A boxplot shows the distribution of house prices is not symmetric. Critically evaluate the sampling process described in detail on the next slide.
21.04 The sampling method. A mixture of cluster sampling and systematic sampling is chosen. Cluster sampling There are nine pages (numbered from 1811 to 1819) of real estate agents in the Auckland Yellow Pages. Each page has 4 columns, with a maximum of about 50 phone numbers on each. A random number generator or spreadsheet formula is used to get the 4 clusters (ie 4 real estate offices). =RANDBETWEEN(1811,1819),RANDBETWEEN(1,4),RANDBETWEEN(1,50) Page (1811 to 1819) Column (1–4) Entry (1 to 50) Systematic sampling Each of the 4 locations obtained is visited and all prices in every second row in their office windows recorded.
21.04 Justify the sampling method. Choose a mixture of cluster sampling and systematic sampling. There are nine pages (numbered from 1811 to 1819) of real estate agents in the Auckland Yellow Pages. Each page has 4 columns, with a maximum of about 50 phone numbers on each. Use a random numbers to obtain 4 clusters (ie 4 real estate offices). Each location is visited and all prices in every second row in their office windows recorded. Justify the use of cluster sampling. It would not be practical to visit a large number of real estate locations and only select a few house prices. Hence a cluster sample is faster and more convenient to obtain.
21.04 Justify the sampling method. Choose a mixture of cluster sampling and systematic sampling. There are nine pages (numbered from 1811 to 1819) of real estate agents in the Auckland Yellow Pages. Each page has 4 columns, with a maximum of about 50 phone numbers on each. Use a random numbers to obtain 4 clusters (ie 4 real estate offices). Each location is visited and all prices in every second row in their office windows recorded. Justify the use of systematic sampling. Taking a systematic sample from each location helps with the convenience. Because the data can be collected comparatively quickly a ‘snapshot’ of prices at the same point in time is possible.
21.04 Critically evaluate the sampling process There are nine pages (numbered from 1811 to 1819) of real estate agents in the Auckland Yellow Pages. Each page has 4 columns, with a maximum of about 50 phone numbers on each. Use a random numbers to obtain 4 clusters (ie 4 real estate offices). Note that the sample is almost certainly biased, and has many faults. Comment on the choice of clusters. The number of clusters is too low and there is a significant difference between the clusters. (The characteristics of Auckland suburbs mean some have low and fairly uniform prices, while others, like Milford, have very expensive beachfront property.) Further preliminary research would help select areas for inclusion in a stratified sample, and this would more accurately reflect the population.
21.04 Critically evaluate the sampling process Note that the sample is almost certainly biased, and has many faults. Comment on the use of systematic sampling. Each location is visited and all prices in every second row in their office windows recorded. One particular row of advertisements and photos of houses may include bargains only, or better quality houses, and some houses may not be advertised there.
21.04 Critically evaluate the sampling process Note that the sample is almost certainly biased, and has many faults. Comment on the quality of the data. As well as the data not supplying the information required in the hypothesis, it is incomplete, and has duplications. Several selling methods are excluded—sale by tender or auction for example—where the price is unknown. The list in the Yellow Pages is not exhaustive—private sales are excluded for example, and some houses are listed with more than one real estate agent. Some real estate agents are listed more than once, so are more likely to be chosen. There are an unequal number of agents listed in each column. Some areas of Auckland may be over-supplied with real estate offices, again making these areas more likely to be selected.
21.04 Critically evaluate the sampling process Note that the sample is almost certainly biased, and has many faults. Do Old Sigma: Ex. 9.2–Q1, Ex. 9.3, Ex. 24.9. Comment as to how better data could be obtained. The sampling process could be improved by gaining access to actual sales data used by real estate agents. (Not easy because they guard it jealously!) This data would answer the question directly, and should include all sales within a particular time period, with no duplication, so would be representative of the population.