Field experiments for assessing question validity

Field experiments for assessing question validity Patrick Sturgis, Department of Sociology, University of Surrey, UK Paper presented at conference on ‘Survey Measurement: Assessing the Reliability and Validity of Contemporary Questionnaire Items’ The Royal Statistical Society,10 April 2008

Plan of Talk • Standard validity assessment for survey questions • Field experiments • Example 1 – Political knowledge • Example 2 – Social trust • Concluding remarks

Standard validity assessment • Nothing • Face/process validity • Correlation with criterion variables • Multi-trait-multi-method (MTMM) • Expert panels • Behaviour coding • Interviewer debrief • Thinkaloud protocols/cognitive interview

Limitations • Small n/purposive selection – do inferences generalize? • Do different techniques/researchers identify same ‘problems’ • Do modifications increase validity? • Paradoxical limitations for survey research!

Field Experiments • Large n with randomization of alternate forms • Clean and powerful inference • Lack of criterion reference can be problematic • But theory can help!

Example 1(with Nick Allum, Patten Smith) Measuring Political Knowledge: Guessing and partial knowledge

Standard approach • MCQ format: • “The Number of MPs in Parliament is about 100” • True • False • DK • DKs ‘encouraged’ • Two key problems (Mondak 2001; 2002): • Some say DK when they can answer correctly at p > 0.5 (partial knowledge) • Some provide a substantive answer when they cannot answer correctly at p >0.5 (guessing)

Personality Variance • Variation in knowledge scores reflects more than just knowledge • Men more likely to guess in absence of knowledge • Women more likely to say DK with partial knowledge • Thus, men ‘appear’ to know more about politics than women

The Solution? • Force all respondents to provide an answer even if they genuinely DK (Mondak 2001) • Randomly allocate residual DKs across substantive categories • Removes personality variance by omitting option of guessing • And of saying DK in presence of partial knowledge

Study 1 - Partial Knowledge • BMRB CATI omnibus (quota sample) • Interviewing 17-19 December 2004 • N = 1006 • Three true/false knowledge items: • Britain's electoral system is based on proportional representation • MPs from different parties are on parliamentary committees • The Conservatives are opposed to the ratification of a constitution for the European Union

Design • “For the next few questions, I am going to read out some statements, and for each one, please tell me if it is true or false. If you don't know, just say so and we will skip to the next one” • If respondent answers DK: • “You said earlier that you don't know whether the number of MPs is about 100. Could you please just give me your best guess?” • Partial knowledge in initial DK responses if % correct after probe > .5

Probed DK Responses

Results Binary logit predicting correct answer (0,1) • Model predicted probabilities of correct answers = • 71% for those giving an initial response • 53% for probed DKs • 50% for random allocation and No gender difference

Study 2 - Guessing • Ask standard format knowledge questions but where answer options are all wrong • Respondents choosing any substantive alternative are ‘guessing’: • Who is the Secretary of State for Trade and Industry? Is it, a. Geoff Hoon b. Peter Hain or c. Do you not know? (correct=Alan Johnson) • BMRB omnibus n=2011, 4-6 November 2005 and 9-11 December 2005

Who is the Secretary of State for Trade and Industry? Is it, a. Geoff Hoonb. Alan Johnson, orc. Do you not know?

Who is the Secretary of State for Trade and Industry? Is it, a. Geoff Hoonb. Peter Hain, orc. Do you not know?

Binary Logit Model Dependent variable guess=1, dk=0

Conclusions • No evidence that DKs in survey knowledge items conceal partial knowledge • Guessing, however, is common and differential (favouring men) • Guessing also related to political knowledge • Recommendation: use ‘standard’ format items • For marginal comparisons, randomly allocate DKs to substantive categories • For associational relationships use number right scoring (treat DK and incorrect as equivalent)

Example 2(with Patten Smith) Investigating Social Trust Using thinkalouds

Conceptions of Trust • Trust is a ‘good thing’ • Trusting citizens are good citizens (voting, volunteering, civic engagement) • Trusting societies are good societies (more democratic, egalitarian, > economic performance) • Trust ‘lubricates’ social and economic transactions • Reduces ‘monitoring costs’

‘Thick’ Trust • Also ‘particularized’ or ‘strategic’ trust • Between people who know one another • Based on personal experience • Encapsulated interests; ‘your interests are my interests’ (Hardin) • I trust x to do y

‘Thin’ trust • Also ‘social’ or ‘generalized’ trust • Trust between people not personally known to one another • More akin to a core social value or attitude • “an evaluation of the moral standards of the society in which we live” (Newton) • A ‘default position’ in transactions with unknown others

Does this matter? • Primary social and individual returns are to thin/social trust • Thick and thin trust may even be negatively correlated • The less we trust people in general, the more we retreat to the safety of those we know • So, empirically distinct measures are clearly essential

The standard trust question • Generally speaking, would you say that most people can be trusted, or that you can't be too careful in dealing with people? • Most people can be trusted • Can’t be too careful • Usually credited to Rosenberg (1959), the ‘Rosenberg Generalized Trust’ (RGT) item

The Local Area Trust item • How much do you trust people in your local area? • a lot • a fair amount • not very much • not at all • Reflects Putnam’s emphasis on trust being a property of local areas

Trust by Question type • These items are both used more or less interchangeably as measures of generalized trust • Yet, they yield very different estimates of trust at the national level. e.g.: • Social Capital Community Benchmark survey: 47% most people can be trusted; 83% trust people in local area ‘some’ or ‘a lot’ • UK Taking Part survey: 44% most people can be trusted; 74% trust ‘many’ or ‘some’ of the people in their local area • Why such a large discrepancy in generalized trust (trust in strangers)?

Research Design • Ipsos-MORI general population omnibus survey • Random selection of small areas, quota controlled selection of individuals • n=989 (fieldwork, November 2007) • Respondents randomly assigned to RGT or TLA item • In answering the last question, who came to mind when you were thinking about ‘most people’/ ‘people in your local area’?

Distributions for trust questions

Who comes to mind by RGT

Who comes to mind by TLA

Who came to mind – both questions

Explanatory Models 1

Explanatory Models 2

Concluding Remarks • Large-scale field experiments are a useful way of assessing validity of questions • Random sample + random manipulation yields strong inferential power • Under-utilized due to cost considerations • But are they really so costly? • A complement to rather than replacement for small n approaches

Papers • Sturgis, P. Allum, N. & Smith, P. (2008) The Measurement of Political Knowledge in Surveys Public Opinion Quarterly 72,90-102. • Sturgis, P. and Smith, P. (2008) Assessing the Validity of Generalized Trust Questions: What kind of trust are we measuring? Paper presented at the ‘Conference on Composite Scores’ ESADE, Barcelona, 14-15 February 2008. • Sturgis, P. and Smith, P. (2007) Fictitious Issues Revisited: political knowledge, interest, and the generation of nonattitudes. (under review). • Sturgis, P., Choo, M. & Smith, P. (2007) Response Order, Party Choice, and Evaluations of the National Economy: A Survey Experiment. Survey Research Methods (in press).

Field experiments for assessing question validity

Field experiments for assessing question validity

Presentation Transcript

Test Validity: What it is, and why we care.

Paper Airplane Experiments

Construct validity

Quiz 1-B Question Menu

ACCESS for ELLs® Scores, Reliability and Validity

EXPERIMENTS

Construct Validity: A Universal Validity System

實證醫學報告

Assessing and Managing Sedation in the Intensive Care and the Perioperative Settings

Assessing Higher Order Thinking Skills

Validity and reliability

First Order vs Second Order Transitions in Quantum Magnets

Tools for Assessing Dividend Policy

MC-Quiz: Chapter 7 - Storage

Chapter 10 - Part 1

Chapter 5: Design of Experiments

Multi Beamlett Extraction Experiments and Sector Magnet Field Investigation

野地的花 Flowers of the Field

Validity/Reliability

VII. Factorial experiments

Quiz 1-A Question Menu

Electromagnetic Field and Waves