350 likes | 427 Views
The response category labeling effect: How the wording of labels affects response distributions in Likert data. Bert Weijters Maggie Geuens Hans Baumgartner. Research questions. Do the labels attached to scale categories influence response behavior?
E N D
The response category labeling effect:How the wording of labels affects response distributions in Likert data Bert Weijters Maggie Geuens Hans Baumgartner
Research questions Do the labels attached to scale categories influence response behavior? What mechanism(s) can account for this response category labeling effect? Are there moderators of this effect? What are the implications of the response category labeling effect for cross-cultural research?
The importance of category labels • I try to avoid foods that are high in cholesterol. versus
The intensity hypothesis Label intensity refers to the perceived degree of (dis)agreement implied by the label; More intense labels represent more extreme positions, which are endorsed less often (e.g., agree vs. strongly agree; superior vs. very good); Even more subtle adverbial modifiers (e.g., strongly vs. completely agree) may influence response behavior; Prior evidence that different intensities are associated with different adverbs (e.g., Cliff 1959; Smith et al. 2009), but little evidence that different adverbs lead to differential category endorsement;
The fluency hypothesis • Research on processing fluency shows that the meta-cognitive experience of ease of processing affects judgment and decision making: • perceptions of the truth value of statements (e.g., Unkelbach 2007); • liking for objects and events (e.g., Reber, Schwarz, and Winkielman 2004); • Repeated statements are more likely to be rated as true (Unkelbach 2007) and repetition increases liking, as suggested by the mere exposure effect (e.g., Bornstein 1989), in part because repetition makes stimuli more familiar and contributes to greater processing fluency; • Words vary in how often they are encountered, and high frequency words are processed more fluently; • If scale labels are more commonly used in everyday language and are thus easier to process, this may increase the likelihood that the corresponding response option on the rating scale is selected;
Two alternative hypotheses to explain the effect of response category labels Intensity hypothesis: • H1: Response categories are endorsed less frequently if their labels are more intense. Fluency hypothesis: • H2: Response categories are endorsed more frequently if their labels are more fluent.
Verbal ability as a moderator of the fluency effect when people are processing more carefully or when people are highly experienced, their actual thoughts, not the ease of generating them, play a more decisive role; Verbal ability (as a form of language expertise) may moderate the fluency effect; We posit that for respondents who tend to use words in a precise manner and who make fine-grained distinctions as to the exact meaning and implications of words, fluency will be less important as a cue in selecting a response;
Study 1: Scaling intensity and fluency • Do different methods for scaling the intensity and fluency of response category labels lead to similar results? If the intensity or fluency of scale labels is to have a reliable effect on responses to questionnaires, consistent differences in the perceived intensity and fluency of category labels should emerge across respondents. • Can we identify endpoint labels that vary significantly in intensity and fluency for use in subsequent studies? We need two labels that imply contradictory responses under the intensity and fluency hypotheses.
Study 1 (cont’d) • Label intensity • Direct ratings of intensity (0 = neutral; 10 = 100% agreement) • Pairwise comparisons of intensity (“Which expression indicates the stronger sense of agreement?”) • Label fluency • Direct ratings of fluency (0 = we never use this term in day-to-day speech; 10 = we use this term very often in day-to-day speech) • Pairwise comparisons of fluency (“Which expression is more commonly used in day-to-day speech?”) • Lexical decision task (press a button labeled ‘end category label’ or ‘not an end category label’ for 6 endpoint labels and five non-endpoint labels) • Word frequency counts in corpora of texts (Google hits, available for specific word combinations in particular countries and languages)
Study 1: Method Sample 1: 83 undergraduates; pairwise comparisons of intensity and fluency of six endpoint labels; Sample 2: 112 respondents (mean age 32.03, 66% female) from an online panel; direct ratings of intensity and fluency on 11-point scales; Sample 3: 125 under graduates (57% female); lexical decision task;
Study 1: Results (cont’d) For intensity, the correlation of the means obtained from the paired comparison and direct rating tasks is .92; The correlations of the means derived from the four fluency methods range from .66 to .97, with an average of r = .84; Thus, there is considerable consistency in respondents’ judgments of the perceived intensity and fluency of different category labels; ‘sterkeens’ (strongly agree) consistently emerged as one of the least intense and least fluent labels, while ‘volledigeens’ (completely agree) surfaced as one of the most intense and most fluent labels;
Study 2 • Direct test of the intensity and fluency hypotheses: The endorsement rate for a high intensity and high fluency label should be relatively low if the intensity hypothesis is true, and it should be relatively high if the fluency hypothesis is true. • Preliminary test of whether the intensity/fluency of labels affects predictive validity.
Measuring response distributions • A major challenge is to measure differences in response distributions that are not item-specific and independent of substantive content; • To do this, we need to observe patterns of responses across heterogeneous items (i.e., items that do not share common content but have the same response format): • Deliberately designed scales consisting of heterogeneous items (Greenleaf 1992) • Random samples of items from scale inventories (Weijters, Geuens & Schillewaert 2010)
Study 2: Method • 161 Dutch-speaking respondents (mean age 31.27, 67% female) from a university panel were randomly assigned to two versions of an online questionnaire: • Endpoint labels of ‘completely (dis)agree’ • Endpoint labels of ‘strongly (dis)agree’ • Four sections: • 6 attitudinal items, one of which was “I love to go out for dinner”; • 10 heterogeneous items from unrelated scales (e.g., “I am a sensitive person”, “Financial security is important to me”), rated on 5-point scales; • Direct ratings of the intensity and fluency of six end labels (100-point scale for intensity, 11-point scale for fluency); • Behavioral measure of choice between five different vouchers worth 15 EUR (cinema, book, restaurant, theatre, gym);
Study 2: Results (p<.001 based on a Poisson regression) • Logistic regression of choice of restaurant voucher on label, attitude toward going out for dinner, and interaction indicates a significant interaction: predictive validity is better for ‘strongly agree’ than ‘completely agree’. The findings support the fluency hypothesis:
Study 3 Replication of the fluency effect with a sample drawn from the general population; Literacy as a potential moderator;
Study 3: Method • 369 Dutch-speaking panel members (mean age 45.8, 50% female) of an online market research agency in Belgium were randomly assigned to two versions of an online questionnaire: • Endpoint labels of ‘completely (dis)agree’ • Endpoint labels of ‘strongly (dis)agree’ • Questionnaire: • 16 heterogeneous items based on Greenleaf (1992), rated on 5-point scales; • Pairwise comparisons of four endpoint labels in terms of intensity and fluency (strongly, completely, fully, and absolutely); • Literacy measure: “I do a lot of reading” and “I prefer activities that don’t require a lot of reading” (strongly associated with having a higher education);
Study 3: Results • The findings support the fluency hypothesis: (p<.05 based on a Poisson regression) • Fluency effect occurs primarily for respondents with lower literacy;
Study 4: Method • 271 Dutch-speaking panel members (mean age 39.2, 51% female) of an online market research agency in Belgium were randomly assigned to two versions of an online questionnaire: • Endpoint labels of ‘completely (dis)agree’ • Endpoint labels of ‘strongly (dis)agree’ • Questionnaire: • 10 heterogeneous items, rated on 5-point scales; • Pairwise comparisons of six response category labels in terms of intensity and fluency; • Antonym test as a measure of verbal ability (4 items); antonym test strongly associated with having a higher education;
Study 4: Results • Manipulation checks: • Fluency effect occurs primarily for respondents low in verbal ability (significant interaction, with significant simple main effect for low verbal ability respondents);
The moderating effect of verbal ability on the fluency effect
Implications of the category labeling effect for cross-cultural research • Response category labels can affect findings in a single-language context (e.g., estimation of population parameters, meta-analytic comparisons), but they are particularly important in cross-cultural research, where labels have to be translated; • Two types of translation: • Literal • Idiomatic • Some authors have emphasized the need to choose scale anchors that are equal in intensity (e.g., Harzing 2006), and prior research has demonstrated that supposedly similar terms may differ in intensity across languages (e.g., definitely vs. bestimmt; see Smith et al. 2009); • Translated adverbial modifiers may also differ in fluency;
Schematic representation of the translation process (based on Bassetti and Cook 2011)
Study 5: Method Approx. 200 English- or French-speaking respondents in five regions (nationality/language combinations) of North America and Europe; Five endpoint labels in each language; 16 heterogeneous items from Greenleaf (1992), rated on 5-point scales; Pairwise comparisons of the six labels plus “agree” or “d’accord” in terms of intensity and fluency;
Study 5: Results Intensity and fluency ratings by region Note: Correlation between the fluency ratings and the natural logarithm of the number of Google hits was at least .88.
Study 5: Results Multilevel model estimates
Study 6 Demonstration that fluency is a viable determinant of extreme responding differences between regions in an international survey; Illustration of how to construct and use relative measures of fluency and extreme responding based on secondary data only;
Study 6: Method 13,520 respondents from 17 European regions; 16 heterogeneous items based on Greenleaf (1992); Use of fully labeled 7-point response scales; Fluency: relative measure of fluency as the natural logarithm of the ratio of the number of Google hits for the 7th category (strongly agree) to the number of Google hits for the 6th category (agree); Endorsement: relative endorsement of the 7th vs. the 6th response category (natural logarithm).
Sample descriptive statistics Pan-European study (Study 7 and 8)
Study 6: Results Note: Standardized regression slope of .67 (p<.01, R2=.45)
Discussion: Summary of findings response category labels that are more commonly used (i.e., that are more fluent) lead to higher endorsement of their associated response categories; respondents do not simply scale response categories along an intensity dimension and then map their latent response to the best-matching category, but they are also influenced by the fluency of the labels; the effect of fluency is more pronounced for respondents who are lower in literacy and verbal ability; the problem may be particularly serious in cross-cultural research when different languages are used;
Implications formultilingual survey research • Translations usually imply a trade-off between the attempt to be literal and the attempt to be idiomatic; • Optimize equivalence: use response category labels that are equally fluent in different languages (rather than literal translations or words with equal intensity); e.g., ‘Strongly agree’ is most commonly used in scales, but may not have valid equivalents in some other languages. ‘Completely agree’ seems to be a viable alternative. fluency ERS% Completely agree 1.24 18.8% Tout à fait d’accord 1.22 19.2%