650 likes | 1.46k Views
Survey Sampling - 1. Introduction & Terminology. What is a Survey ?. A survey is a: Systematic method for gathering information From a sample of entities For the purposes of constructing quantitative descriptors Of the attributes of the larger population of which the entities are members.
E N D
Survey Sampling - 1 Introduction & Terminology
What is a Survey? A survey is a: • Systematic method for gathering information • From a sample of entities • For the purposes of constructing quantitative descriptors • Of the attributes of the larger population of which the entities are members
Surveys are just one form of data collection –but they are unique in that: • Data gathered via a process of question asking and answering • Data is often “generated” at the time the question is asked • Hence, data accuracy very much depends on the design of the survey and the survey process • For Example • Does the survey design minimize human error? • Does the survey process engender the good will and motivate those taking the survey?
Surveys are not the only (nor necessarily the best) way to collect data • Other methods include • Census • Administrative Records • Focus groups and qualitative investigations • Randomized experiments • Which is “best” depends on the research question(s) and/or the purpose for which the data will be used
Census vs. Survey CENSUS Everyone in the population is asked the same questions. • Expensive • Slow • Huge practical operation • Small number of questions • Population values SURVEY A sample of the population is drawn and interviewed • Cheaper • Faster • More feasible • Larger number of questions possible • Sampling errors • Sample estimates
Administrative vs. Survey ADMINISTRATIVE Information collected as a by-product of an Administrative function • Detailed and high accuracy • Narrow focus • Objective measures only • Small number of questions • Population values SURVEY A sample of the population is drawn and interviewed • Cheaper • Faster • More feasible • Larger number of questions possible • Sampling errors • Sample estimates
There are lots of types of surveys • We will focus on surveys where • Information is primarily gathered by asking people questions • Information is collected by either • interviewers asking questions and recording responses • respondents reading and recording their own answers • Information is only collected from a subset of the population –a sample –rather than from all members
Poll vs. Survey • There is no clear distinction between the two terms • “Poll” most often used for private sector opinion studies • Use many of the same design features as studies that would be called surveys • “Poll” rarely used to describe government or scientific surveys • a commercial or less-scientific study, or • a quick turn-around survey whose results may be of short-term interest
Sampling: Why sample? • By taking a sample of the population we should get a fairly accurate picture of the whole • Sampling theory tells us that under certain circumstances we can draw inferences from a fraction of the population • Random sample – or at least with a known probability of selection • Greater the sample size, the more accurate the measures (but with diminishing returns)
Sampling Frames There is a classic Jimmy Stewart movie, Magic Town, about "Grandview," a small town in the Midwest that is a perfect statistical microcosm of the United States, a place where the citizens' opinions match perfectly with Gallup polls of the entire nation. A pollster (Jimmy Stewart), secretly uses surveys from this "mathematical miracle" as a shortcut to predicting public opinion. Instead of collecting a national sample, he can more quickly and cheaply collect surveys from this single small town. The character played by Jane Wyman, a newspaper editor, finds out what is going on and publishes her discovery. As a result the national media descend upon the town, which becomes, overnight, "the public opinion capital of the U.S." Usually sampling is not that easy! Dr. R.S. Albayrak
Sampling: Types of sample • Random • The sample is selected completely at random • Probability • There is a known probability of selection; usually through clustering or stratification of the sample… • Nonprobability Sampling • The probability of selection of the samples cannot be calculated. Examples are convenience (haphazard or accidental), snowball, judgment (purposive), deviant case, quota sampling
(Simple) Random Sampling rassal • A sample is selected so that all samples of the same size have an equal chance of being selected from the entire population. • Not an easy concept. • Hard to achieve in practice. • Different from haphazard sampling. • In haphazard sampling designs selection probabilities of some samples are unknown (not calculable). gelişigüzel
Random or Haphazard? • Interviewing with students that passes through front gate. • Interviewing with students that are selected from the database in Registers Office.
x = cbind(rep(1:10, 10), gl(10, 10)) par(mar = rep(0.1, 4)) for (i in 1:100) { plot(x, pch = 19, col = "blue", axes = F, ann = F) points(x[sample(100, 15), ], col = "red", cex = 3, lwd = 2) Sys.sleep(1) }
Simple Random Sampling • Random sampling can also refer to taking a number of independent observations from the same probability distribution, without involving any real population. (We will frequently use this property throughout the class) • The sample usually is not a representative of the population of people from which it was drawn— this random variation in the results is termed as sampling error. • In the case of random samples, mathematical theory is available to assess the sampling error. Thus, estimates obtained from random samples can be accompanied by measures of the uncertainty associated with the estimate. • This can take the form of a standard error, or if the sample is large enough for the central limit theorem to take effect, confidence intervals may be calculated.
Systematic Sampling • Systematic Sampling is often used instead of random sampling. It is also called an Nth name selection technique. After the required sample size has been calculated, every Nth record is selected from a list of population members. • As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method. Its only advantage over the random sampling technique is simplicity. Systematic sampling is frequently used to select a specified number of records from a computer file.
Sampling: Probability samples • Stratified Samples • Where sub-populations may vary, it can be advantageous* to sample each group independently. • For example, ethnicity can be stratified to ensure that when the sample is drawn there are sample members from each ethnic group. • Similar argument holds for geographic location as well. • Stratum (plural for strata) MUST partition the population. *Inserting information about the population increases representability of the sample, therefore a smaller sampling error .
Sampling: Probability samples • Clustered Samples • To make it more practical for interviewers, clusters of sampling points are chosen at random, and then addresses within these clusters are chosen at random – this means that the addresses will be relatively close together
Nonprobability Sampling • Convenience, Haphazard or Accidental sampling - members of the population are chosen based on their relative ease of access. To sample friends, co-workers, or shoppers at a single mall, are all examples of convenience sampling. • Snowball sampling - The first respondent refers a friend. The friend also refers a friend, etc. • Judgmental sampling or Purposive sampling - The researcher chooses the sample based on who they think would be appropriate for the study. This is used primarily when there is a limited number of people that have expertise in the area being researched. • Deviant Case - Get cases that substantially differ from the dominant pattern (a special type of purposive sample).
Nonprobability Sampling Quota sample • The defining characteristic of a quota sample is that the researcher deliberately sets the proportions of levels or strata within the sample. This is generally done to insure the inclusion of a particular segment of the population. The proportions may or may not differ dramatically from the actual proportion in the population. The researcher sets a quota, independent of population characteristics.
Example: Quota Sampling A researcher is interested in the attitudes (yönelim)of members of different religions towards the death penalty. In Iowa a random sample might miss Muslims (because there are not many in that state). To be sure of their inclusion, a researcher could set a quota of 3% Muslim for the sample. However, the sample will no longer be representative of the actual proportions in the population. This may limit generalizing to the state population. But the quota will guarantee that the views of Muslims are represented in the survey.
Properties of sample survey data Sample surveys aim to provide data that: • When analyzed statistically, can be generalized to the whole population (within known confidence intervals) • Is unbiased • Is standardized and collected with respect to a planned design • Is reliable and valid The question should be measuring what you think it is measuring If you ask the same question more than once, you get the same distribution of answers
Sampling WHO? Who responds? • Individuals • Households • Establishments/ organizations • Land • Animals or plants About… • Individuals • Events • Households • Other people in the household • The local area • An industry Unit of ANALYSIS Unit of OBSERVATION ECOLOGICAL FALLACY
Survey Modes The way in which the survey is conducted • Face-to-face: Interviewer call on sample member in person and try to conduct interview at home • Telephone: Interviewer, usually from central call-centre, telephone household and try to conduct interview over the telephone • Postal: Self-completion questionnaire sent through the post. The sample member to fill it in and post it back in a pre-paid envelope • Web-based: Self-completion questionnaire, sample member emailed with link to survey to be completed on the internet.
Types of Survey • Cross-sectional: one-off, “snapshot” polls. Taking a sample of people and interviewing them once. Make up a lot of government surveys and the most of market research and polling generally seen in newspapers – especially political polls. Used to analyze aggregate changes in the population.
Types of Survey • Longitudinal: Surveys with the dimension of time; often interviewing the same people at different times. Use to analyze individual changes in the population.
Why collect cross-sectional data? (1) • Good for describing the current situation • Different design and data collection variants • Sampling sub-populations (by age, geographical location) • Questionnaires • Diaries – time-use, nutritional, travel, expenditure • Responses to visual stimulus (e.g., branding) • Experimental design with control and treatment groups
Why collect cross-sectional data? (2) • Trend analysis using repeated cross-sections (Family Resources Survey, Citizenship Survey, General Household Survey) • Same questions each year • New sample each year • Look at aggregate change
Why collect cross-sectional data? (3) • Can include longitudinal element • Retrospective histories (employment, marriage, fertility, housing, migration…) • Questions about a reference period (in the last month/year, since leaving school)
Why collect longitudinal data? (1) • Provide a dynamic analysis of change rather than a snapshot GO to GRAPHS >
Why collect longitudinal data? (2) • Provide a dynamic analysis of change rather than a snapshot; net versus gross change • Better able to disentangle age and cohort effects • Model transitions into and out of particular states (stability analysis) • Better able to make causal inferences; A→B rather than A↔B GO TO CAUSATION vs CORRELATION
Survey Measurement • Concept – That which we are trying to measure. • Tolerance • Physical fitness • Poverty (yoksulluk) • Operationalization – The way in which we try to measure the concept • Willingness to live next door to people from different groups; Actually living next door to people from different groups • Resting heart-rate; Reported level of physical activity • Monthly income below a fixed amount; Self-identification as “poor” • Survey question(s) or direct observation.
Survey Measurement • Operationalization of concepts are of two types:
Problems in Survey Questions • Ambiguous wording • Biasing or leading questions • E.g., Do you favor murdering babies in the womb? • Social desirability bias • Double-barrelled questions • E.g., Do you favor reducing Turkey’s dependence on foreign oil by increasing taxes on oil imports? • Double negatives • E.g., Last time you said you were not a supervisor or manager. Is that still the case? • Mutually exclusive and exhaustive response options
Evaluating Measures Any measure can be evaluated by looking at its: • Validity • Is the question/observation technique measuring the concept we intend to measure. • Reliability • Is the question/observation technique measuring the concept consistently and dependably. • Reliable measures may not be valid measures • Valid measures are most likely reliable.
Non-response: what is it? Non-response is seen as a measure of survey quality • Unit non-response • When an individual does not take part in the survey at all because they cannot be contacted, refuse to take part or are incapable • Item non-response • When an individual takes part in the survey, but refuses or cannot answer particular questions
Non-response: why does it matter? If random, then no problem* * Accuracy of estimates affected by reduced sample size • Non-random/ Systematic non-response by some group • Non-Response Bias • E.g., employees less likely to respond, estimates biased in favor of non-employed or unemployed. • The goal of survey methodologists is not necessarily to reduce non-response, but minimise non-response bias
Little bit History Censuses • Used by countries to know how many people it had for war efforts and taxation • Ancient Egypt (3340 BC) • Rome (c. 570BC) • Greece (c 600 BC) • Persia (500 BC) • India (c. 300 BC) • China (2 AD) • England (1086) • Ottoman Empire (1830)
Famous Examples (1) • King Rtuparna flaunts his mathematical skill by estimating the number of leavesand of fruit on two branches of a spreading tree. Apparently he does this on thebasis of a single twig that he examines. There are, he avers, 2095 fruit. • Nalacounts all night and is duly amazed by the accuracy of this guess.
Famous Examples (2) • Greek historian Herodotus (circa 485-420 BC) on the Egyptians and was found by Rubin (1968, p. 31): “They declare that three hundred and forty-one generations separate the first king of Egypt from the last mentioned (Hephaestus) – and that there was a king and a high priest corresponding to each generation. • Now reckon three generations as a hundred years, • three hundred generations make ten thousand years, • and the remaining forty-one generations make 1,340 years more; thus one gets a total of 11,340 years ...”
Famous Examples (3) • P. S. Laplace – used ratio estimation in 1830's. • Wanted to determine population of France in 1802 • Number of births is easy to obtain from public records. • Size of population is difficult to determine. • Use births to predict population • Sampled 30 communes • Total population: 2,037,615 • Births in last 3 years: 215,599 (or 71,866.33 per year) • persons per birth: • 2,037,615/71,866.33 = 28.35 • Multiply births by 28.35
Sampling theory • Late-19th/early-20th century • 1895-1903: Kiear (Norway) advocate research into “representative investigation” – total enumeration not possible because of cost/details – argue in favor of sampling • 1913: Bowley connect statistical theory and survey design; discussing random sampling, sampling frames, PSUs. • 1920s: Tschuprow/Chuprov develop stratified sampling • 1934: Neyman publish on random sampling, stratification, cluster sampling, linear estimators for large samples – “major breakthrough” leading to modern scientific sampling