Field Research Methods

Field Research Methods Lecture 4: Sampling & Sampling Error Edmund Malesky, Ph.D., UCSD

Probability Sampling • Each element has a known, nonzero chance of being included in the sample. • Selection bias is avoided • Statistical theory can be used to derive properties of the survey estimators. • Alternative is non-probability sampling (volunteers, convenience, expert selection) • Highly subjective, precluding a theoretical framework for analysis.

Organization • The Sampling Frame • Types of Sampling • Sample Size • Sampling Error

1. Tools • A Definition of the Population you wish to study (i.e. households, firms, customers, foreign policy elites…) • The sampling frame: More or less, a complete list of the individuals in the population to be studied. • Lists can be obtained through census bureaus, tax offices, or other agencies. • If a list does not exist, you will need to create one. This usually necessitates performing your own census or a a multi-stage research design, where the first stage is identification of the population. • Sometimes the sampling frame will be individuals who went somewhere or did something, allowing them to be sampled (visits to the hospital, meeting attendance). • In this case, be wary of selection bias. These individuals did not arrive in your population accidentally. They selected themselves into it (North Korean refugees in China, UCSD students….).

Cambodian Sampling Frame

Types of Sampling • Simple Random Sampling – Each individual has the same probability of being selected.

Types of Sampling • Systematic Sampling – Divide the desired sample size by the population size (s/p). This will give you a selection ratio (100/8,500 = 1/85). Thus, 1 out of every 85 people should be selected. Select a starting point on the list and begin. • Warning: If there is any pattern to the ordering of the list (age, name…), this will not work.

Types of Sampling • Stratified Random Sampling- Used when you are worried normal sampling variation will lead to unrepresentative sub-groups.

Stratified Sampling Simple Random Sampling: Will give me the percentage balls of a certain color (plus/minus 3%) If I want to be more certain, I stratify and randomly sample within category 15% Yellow 10% Green 50% Blue 25% Red =

Type of Sampling • Area (Cluster) Probability Sampling: Representative sample of geographical units, then individuals within unit. • Two-Stage: Like clustering, except for random sample within group.

Determining Sample Size • In a Simple Random Sampling Design, we only need to know three things; • The population size • The variability of the parameter • The desired level of precision & confidence If you are interested in a proportion of the population substitute P(1-P) for S.

Example (Iarossi, p. 100) • Population = 650,000 • We are willing to accept a margin of error of 3% • We decide we would like a 95% confidence interval. • We assume that the residents are equally split between supporters and opponents.

Click here for on-line version

Sample Size: Stratified Sample(Neyman Allocation)

Executing a Stratified Sample in STATA • How to? • More detail

The Mystery of Margin Error A joke from Joe Klein: “Prime Minister Ehud Olmert is testing the limits of the possible: in a recent poll by a local television station, he had a favorable rating of 3%. Given the poll's margin of error, it was possible Olmert had no support beyond his extended family."

The Mystery of Margin of Error

Sampling Error • The potential variation due to measuring a sample rather than the entire population. • The margin of error equals the confidence interval (usually produced by a 95% confidence level). Click here for on-line version

Sampling Error • Notice: Sampling Error Decreases when: • Sample Size Increases • The Estimated Proportion approaches 0 or 100% (usually assumed to be 50%) • The Confidence Interval Gets Smaller • Also Notice: • Margin of Error is not a measure of other types of error (bias, non-response, measurement) – only sampling error. • When comparing two candidates, the margin of error applies to both numbers.

Remember • With a 95% confidence interval, 1 out of every 20 times, our mean value will be outside the confidence interval. • It is impossible to determine if the actual population results fall within the CI for the results of a particular survey. • Blind roulette: Imagine a roulette wheel, where 95% of the slots are red. Each time we spin we know we have a 95% chance of hitting red. The problem with a survey is that we cannot see the colors.

Root Mean Squared Error for Categories

Field Research Methods