410 likes | 641 Views
Cross-cultural surveys Part of the Comparative Cross-national Electoral Research Programme Funded by the Economic and Social Research Council . Contact us:. Mediaeffectsresearch.wordpress.com @ femalebrain. Cross-Cultural Surveys:.
E N D
Cross-cultural surveys Part of the Comparative Cross-national Electoral Research Programme Funded by the Economic and Social Research Council Contact us: Mediaeffectsresearch.wordpress.com @femalebrain
Cross-Cultural Surveys: • Part I: Total Survey Error: survey quality (comparability, translation) • Data quality & Total Survey Approach • Sampling • questionnaire design • Part II: data processing and statistical adjustment (i.e. survey weights) • survey weights • Linking micro-macro data
Total Survey Error Paradigm and CCS Part I: Data Quality in Cross Cultural Surveys
Why Cross-Cultural Surveys? • Represent opinion - Want to know what people think, reports of behaviour • Represent people: describe a population • Represent relationships: among attitudes & attributes • Cross-cultural surveys - monitor and explain trends in attitudes, beliefs and values across countries • We are interested in contextual/cultural influences • Rapid growth in the conduct of surveys across a large number of countries and increasing availability of these data
Major Cross Cultural Surveys Surveys Number of Countries • CSES*, EES, ESS, ISSP*, WVS* • Barometers • Euro • Latin • Asia • Asian • Afro • 218 FH Countries • 124 are present in one Cross-national survey • *includes USA and Canada
Data sets used today • World Values Survey • Ron Inglehart • European Social Survey • Centrally funded and managed, Descartes prize winner • Comparative Study of Electoral Systems • Country teams for national election studies, coordinated by UMich
CCS & Total Survey Error Approach I • Total survey error (TSE) = all errors that may arise in the design, collection, processing, and analysis of survey data. • survey error is defined as: error = abs[TRUE VALUE – ACTUAL RESPONSE] • errors can arise from the survey frame deficiencies, sampling error, interviewer & mode effects, item and survey non-response…
CCS & Total Survey Error Approach II • CCS adds additional layers to the types of errors and how errors may affect the quality of data • comparability • Translation - do concepts translate? • time use patterns and survey traditions may affect response rates • Sampling – selection of countries to study
Questionnaire Design & Development • Objective – Comparability • Our goal is to be able to compare responses across a set of cultural contexts. • Differences should be due to underlying differences in values rather than error (e.g. question wording) • Do not assume ‘that the use of similar instruments administered under similar conditions is truly sufficient to ensure that respondents from different cultural groups will arrive at the same interpretations os survey items” (Harkness, Vijver, Mohler 2003)
Achieving measurement equivalence with translation: • Decentering • Develop questionnaire in two languages • Back translation • After translating in other languages translate back into original language • See guidelines developed for European Election Study
Survey (Unit) Non-response • Response rates vary across countries – what are implications for cross-cultural survey quality? • Lower response rates may indicate poorer quality in terms of representativeness but greater efforts to increase response rates my include ‘lazy’ respondents. • Responding to a survey = ability& motivation AND skill of interviewer • Improve response rates by re-contacting and refusal conversion • Bias introduced when non-response is non-random
% No initial refusal Non-response, motivation & data quality European Social Survey: How interest motivates participation and may lead to bias in surveys. European Social Survey: Interviewer ratings of respondents engagement with survey.
Item Non-response • Item non-response occurs when a respondents does not give an answer to a question • As with survey, respondents need to be motivated to respond to question – optimise rather than satisfice • Optimising requires – comprehensive (cognition) and judgement • Satisfice – straitlining, middle response, don’t know answers
Average number of professional survey organisations in each country: Measuring Data Quality in CCS
Response rates Measuring Data Quality in CCS
Reluctance to give answers and satisficing Relationship between non-response & data quality
Sampling I • At least two levels of sampling • Country (level 2) and individual (level 1) • Probability sample at level 1 (can be multi-stage) • Level 2 - not a probability sample of countries • Assumption of exchangeability violated
Sampling II • Level 2 - not a probability sample of countries – purposive, convenience or the population • Researchers concerned about number of countries when conducting statistical analysis. • However, should also be concerned about type of sample: • Assumption of exchangeability violated • Not a problem if it is plausible that factors related to outcomes are unrelated to selection (Snijders)
Sample Characteristics – Bias? Turnout – global average %66 Education – global average 8
% feeling close to political party by gender and FH Political Rights And now for some comparisons using CSES data:
Additional resources on Translation Harkness, JA. 2008. “Round 4 ESS Translation Strategies and Procedures”, European Social Survey, [http://www.europeansocialsurvey.org/index.php?option=com_docman&task=doc_download&gid=351&itemid=80]. Harkness JA, Schoua-Glusberg A, 1998. Questionnaires in Translation. In: ZUMA-Nachrichten Spezial No.3.Cross-Cultural Survey Equivalence. Harkness JA (ed.). Mannheim: ZUMA.
Data linking – measures of context linked to individual survey responses Multi-level data Part II: Data processing, statistical adjustments and analysis
Weighting CCS Data to Adjust for Non-Response • Types of Weights • sample design weights, non- response weights, and post-stratification weight • Weighting adjusts for - unequal selection probabilities as well as adjustments for nonresponse and stratification – compensate for different probabilities of being selected • Treating as a simple random or representative may lead to smaller standard errors
Post-stratification weights to Adjust for Non-Response • Available in most survey data sets • It requires the use of auxiliary information about the population and may take a number of different variables into account. • Information usually needed: • Population estimates of the distribution of a set of demographic characteristics that have also been measured in the sample • For example, information found in the Census such as: • Gender, Age, Educational attainment, Household size, Residence (e.g., rural, urban, metropolitan), Region
Post-stratification weights to Adjust for Non-Response • Weights primarily adjust means and proportions. OK for descriptive data but may adversely affect inferential data and standard errors. • Weights almost always increase the standard errors of your estimates. (or assume self-weighted). • Self-weighted means that equal probability of selection into sample. • Therefore, for analysis do you need to weight data? Most of us want to go beyond descriptive statistics. • Also, what to do with level 2, no weights.
Example: Weights in European Social Survey • Sampling Design Weights: Kish (1994, p.173) provides the starting point of the sampling expert • panel’s work: “need for similarity of sample designs. Flexibility of choice is particularly advisable for multinational comparisons, because the sampling resources differ greatly between countries. All this flexibility assumes probability selection methods: known probabilities of selection for all population elements.” • Following this statement, an optimal sample design for cross-cultural surveys should consist of the best random sampling practice used in each participating country. The choice of a specific sample design depends on the availability of frames, experience, and of course also the costs in different countries. If, after the survey has been conducted, adequate estimators are chosen, the resulting values are comparable. To ensure this comparability, design weights have to be computed for each country. For this, the inclusion probabilities of every sample member at each stage of selection must be known and recorded in the Sample Design Data File (SDDF).
Problems with Weights • Weights primarily adjust means and proportions. OK for descriptive data but may adversely affect inferential data and standard errors. • Weights almost always increase the standard errors of your estimates. (or assume self-weighted). • Self-weighted means that equal probability of selection into sample. • Therefore, for analysis do you need to weight data? Most of us want to go beyond descriptive statistics. • NO WEIGHTS FOR LEVEL 2 – to adjust for non-response or probability of selection
Data Analysis with Weighted Data • Should use a statistical procedure that adjusts for the impact of the weights on the standard errors. Standard errors based on the actual N and not the weighted N. • Not available in SPSS. SPSS treats weights incorrectly in inferential statistics • SVY procedures in Stata. • Also use of pweight. • fweight not correct • Another choice is to not use weights at all for regression models. Instead include all the variables used to create the weights as independent variables. Results in unbiased estimates and standard errors. [MY PREFERENCE]
Macro and Micro: Data Linking • We collect survey data across a number of countries because we are interested in variation across different cultural contexts. • Mean Differences correlated with country level factors (differences in intercepts) • Relationship between variables of interest varies across countries (differences in slopes) • Want to measure country level (contextual) factors so they become explanatory (level 2) variables • Need to link these country level factors to individual level survey variables.
Macro and Micro: Data Linking Technical issues: In linking micro and macro data need a linking identified as a variable in each data set (e.g. country_number) sort data by the country identifier (for example, macro_id) and save it. Merge both datasets: merge macro_id using macrodata Check the merged data: tabulate the new variable _merge NOTE: Macro data will have one line of data per country while survey data will have a line of data for each individual within each country
CCS & Multi-level data structure • Contain multiple levels of analysis, with each level consisting of distinct units of analysis. • Most common: hierarchical data. • Two-level structure: Units from the lowest level of analysis (level-1 units) are nested within units from a higher level of analysis (level-2 units) • Data are \clustered” • Voters nested within districts • Voters nested within time • Panel data • Time-series cross-sectional data (TSCS) • Three-level structure: e.g., voters nested within districts nested within countries, or students nested within classes, nested within school schools
CCS & Multi-level data structure • Different ways to deal with clustered data, depends on how one treats between-cluster and within-cluster variation • Pooling: degree to which cluster mean is drawn towards overall mean • Complete pooling : OLS (doesnt distinguish within- versus between-cluster variation) • No pooling- effects estimator (ignores between-cluster variation) • Between effects estimator (ignores within-cluster variation|regression of cluster means) • Partially pooling : random intercept model • Takes information from both clusters and overall sample
CCS & Multi-level data structure • What sort of variation are we looking at across countries (or clusters)? • Heterogeneity in the response factors specific to each cluster may influence outcome; factors shared by observations within each outcome Method: random intercept model
CCS & Multi-level data structure • Rather than just the mean in each country being different, the relationship between two variables of interest could vary across countries. • For example, the relationship between gender and political participation may be weaker in countries with more egalitarian values and political structures: • Causal heterogeneity When the relationship between X and Y varies across cluster How higher level variables shape lower-level relationships. Method: random coefficient model Prevents cluster-confounding: assuming within-cluster effect and between-cluster effects are identical, while in fact they aren't.
A couple of different approaches: • Two stage (Shively and Long-Jusko in Political Analysis) – • Predict regression line in each level 2 unit (e.g. constituency) • And then model the coefficients • Packages such as MLwIN and HLM • Stata and SPSS have routines