Survey Sampling Issues and Solutions: A Comprehensive Overview

Sampling Issues:part I Leonie huddy, Stony brook university leonie.huddy@sunysb.edu matthewbaum harvard university

outline • Major Sources of Survey Error • II. Coverage Error • 1. Coverage Problems in US (Phone & Web) • 2. Coverage Issues in Sweden • 3. Implications for Experiments • III. Non-Response Error 1. Rates in the US 2. Factors Influencing Response Rates 3. Response rates in Sweden • IV. Survey Mode Errors

I. Major Sources of Survey Error (Alwin/Groves) • 1. Coverage error: Error due to failure to include some elements of the population in the sampling frame (e.g, cell phones in RDD landline study in the US, non-computer households in a web survey) • 2. Sampling error: Errors due to sampling a subset rather than the entire population. • 3. Non-response error: Error due to failure to obtain data from all selected population elements (young males harder to reach; Latinos reluctant) • 4. Measurement error: Error that occurs when observed value is different from the true value (higher reports of voter turnout in ANES) • These errors also apply to survey experiments

II. Coverage Error (Groves) DEFINITIONS Sampling frame: set of lists or procedures intended to identify all elements of the target population; e.g., RDD, national registry (SPAR), US Postal Mail Delivery System Coverage: • Undercoverage- some population elements are missing from the sample frame (e.g., cell phone users who are disproportionately young and less affluent in an RDD landline study; older respondents who lack computers or broadband in a web survey) • Ineligible units (non-working phone #s) • Clustering of elements at a single frame element (several people with 1 phone number) • Duplication: single target element linked to multiple frame units (a person listed more than once in a national registry)

2. Non-Coverage problems in telephone samples NON-LANDLINE HOUSEHOLDS • younger, more mobile, less affluent • more ethnic and racial minorities • live in rural areas, south, central cities • Reaching 25% of the US population Solution? • Base sample on a mix of cell phone-only and landline households and eliminate those with landlines from the cell phone sample • Post-stratification weights based on demographic factors

The growing cell-only population by age (Pew 2010)

Age Composition of Landline Phone Samples (Pew, 2010)

Non-Coverage on Web: % Broadband at Home (Pew) 2005 2006 2007 2009 All adult Americans 30% 42% 47% 60% Gender Male 31 45 50 61 Female 27 38 44 58 Age 18-29 38 55 63 76 30-49 36 50 59 67 50-64 27 38 40 56 65+ 8 13 15 26 Race/Ethnicity White (not Hispanic) 31 42 48 63 Black (not Hispanic) 14 31 40 52 Education Less than high school 10 17 21 24 High school grad 20 31 34 46 Some college 35 47 58 73 College + 47 62 70 83 Income Under $30K 15 21 30 42 $30K-50K 27 43 46 62 $50K-$75K 35 48 58 73 Over $75K 57 68 76 83

4. Non-Coverage Issues in Sweden • Telephone & Mail • Sweden has a low-rate of cell-only households; ≤ 5% (Hecke & Weise 2012; in Telephone telephonesurveys in Europe, ed. Häider, Häider, & Kϋhne; Springer, Heidelberg ) Non-coveraqe is far less of a problem in Sweden because samples are drawn from the national SPAR registry • No sample frame is ever 100% so there may still be minor non-coverage issues • Internet • “Sweden has a unique position in the world when it comes to Internet use, not only because it is one of the countries with highest share of Internet users in the world but also because Internet use is more widely spread in Swedish society compared to other countries, in terms of age and educational level (Findahl 2007; 2008b). Among younger Swedes 16–25 years old almost all (97%) use the Internet at least once a month; among older Swedes 56–65 years old Internet use is currently as high as 75%. The corresponding figure among individuals 66–75 years old is lower, however, at 51% (Findahl 2008a).” (quoted in Kallmen et al )

SPAR – National Swedish population registry • Statenspersonadressregister, SPAR includes all persons who are registered as resident in Sweden. • The data in SPAR is updated each day with data from the Swedish Population Register. • SPAR is specifically regulated in Swedish Law by the Act of (1998:527) statenspersonadressregister and by the Regulation (1998:1234) of statenspersonadressregister and the Swedish Tax Agency Regulation on handing out data from SPAR (SKVFS 2011:06). • The aim of SPAR is clear from the purposes set out in article 3 of the Act. It states that personal data in SPAR may be processed to: • update, supplement and verify personal information or • select names and addresses for direct marketing, public service announcements or other comparable activities. • Processing data in this respect is the same as handing out the data electronically. Data in SPAR are, after decision by the Swedish Tax Agency, electronically handed out at cost price.

III. Non-Response Error • Two key types of non-response: • Non-contact: the failure to reach the chosen respondent • Refusal: chosen respondent does not cooperate • Rates have declined precipitously in the US over the last 2 decades; • Non-contact rates by telephone dropped dramatically after 2000 and the introduction of caller ID • Refusal rates are higher in urban areas

Response RateS IN THE u.S. Response Rate = Number of people who completed an interview/total number of eligible respondents contacted (including not at home, refused, etc.) • Household CAPI or IN-PERSON surveys: in the U.S. these are around 50-60% in university research centers. • Telephone surveys: In the US, 40-50% at university centers using very stringent and expensive methods; lower for typical phone surveys at university centers (25-35%) much lower for marketing and media (6-20%) • Mail surveys: very variable; possible to get 15-20% RR with follow up; but depends on the population. • Web Surveys: Depends on the population. Could be as high as 50-70% within an organization with a known email list and organizational support, or <1% with a random group (e.g., banner ad recruitment).

What Influences Response Rates (RR)? • SURVEY MODE : highest for household in-person interview, generally lowest on the web. 2. RESPONDENT SUBGROUP. Non-response is often higher in cities; can also vary with age (young are harder to contact), and gender (men are harder to contact). 3.TYPE OF SURVEY ORGANIZATION: academic polls vs commercial. RR Typically higher when conducted by an academic or non-profit organization. 4. UNIT OF INTERVIEW: Higher RR if anyone in the home or a surrogate can be interviewed. • US National Health Interview Survey (NHIS): non-response rate is xx% • Sweden Census (SCB) contacts relatives of respondents to increase RR (from Jacob Sohlberg) 5. EFFORT TO REACH NON-RESPONDDENTS : Greaternumberof contact attempts, use of financial incentives, refusal conversion, longer interviewing period, all increase costs. 6. SURVEY TOPIC AND RESPONDENT INTEREST : Slightly higher RR on topical surveys in the news or those in which respondent is very involved and interested.

Who is Missing?—us non-respondents • Age: Underestimate the young. • Largely due to non-contact • Gender:Under represent men • More difficult to contact and refuse more • 3. Race/Ethnicity: • Oversample blacks by phone • Undersample blacks in household in-person • Typical Solution: Weight respondents to demographic population benchmarks

Non-response rates in sweden • Varies by sample mode (mail, phone, web, IVR) • RR Remains high in mail surveys

Response rates Web vs. mail: swedenKällmén et al 2011 • Two random samples of 1250 individuals each were drawn from the same national register (DAFA-SPAR) over all Swedish individuals (aged 17-71) having a registered address. • Electronic, web-based response group, received a postcard with the same introductory text, an URLlink and a log-in code to the electronic version of the questionnaire • Paper-and-pen response group, two reminders were sent, three and six weeks after the main mailing. • After the first mailing, 314 individuals (25%) responded to the AUDIT paper version and 167 (13%) responded to the web-based version. Following the first reminder, the total number of responses was 483 (39%) in the paper group and 230 (18%) in the web-based group. • After the second reminder the final number of responses for the paper version was 663 (53.6%), 276 men and 344 women (43 did not disclose their gender). For the web-based version of the AUDIT, the final number of responses was 324 (26.2%), 140 men and 184 women.

Web vs. interactive voice response (ivr)Sinadinovic et al, 2011

5. So What? Implications for survey experiments • Major problem with coverage and non-response errors is sample bias – overly educated, too sophisticated, older, etc. • Does this matter when running an experiment with random assignment? • It depends on : • Heterogeneous experimental treatment effect • Well-theorized and well-measured sources of experimental treatment heterogeneity (an issue to which we will return when discussing measurement issues) • The following slides cover 2 examples concerning heterogeneous experimental treatment effects that depend on level of political sophistication (involvement or partisanship).

From Druckman and Kam, 2011 The external validity of a single experimental study must be assessed in light of an entire research agenda, and in light of the goal of the study (e.g., testing a theory or searching for facts).  Assessment of external validity involves multiple-dimensions including the sample, context, time, and conceptual operationalization. There is no reason per se to prioritize the sample as the source of an inferential problem.  The nature of the sample—and the use of students—matters in certain cases. However, a necessary condition is: a heterogeneous (or moderated) treatment effect. Then the impact depends on: o If the heterogeneous effect is theorized, the sample only matters if there is virtually no variance on the moderator.  The range of heterogeneous, non-theorized cases may be much smaller than often thought. Indeed, when it comes to a host of politically relevant variables, student samples do not significantly differ from non-student samples.  There are cases where student samplesare desirable since they facilitate causal tests or make for more challenging assessments.

Source of Survey Error (Alwin) Non-observed (bias) Non-observed (variance) -coverage bias -coverage error variance -sampling bias -sampling error variance -nonresponse bias -nonresponse error variance Observed (bias)Observed (variance) -interviewer bias -interviewer error variance -respondent bias -respondent error variance -instrument bias -instrument error variance -mode bias -mode error variance

Effects of Sample Bias (Coverage AND / or non-response): Unpredictable Effects in Experiments • Bias can either enhance, dampen, or have no effect on the experimental outcome • Example 1: From The Ambivalent Partisan (Lavine et al) • Most sophisticated LEAST affected by ideology in presence of a partisan cue • In the following example, researchers are interested in whether partisan labels would override ideological content in support of a policy. • The answer varies with the mix of ambivalent vs. towards strong, univalent partisans in the partisan. Bias in the sample towards strong partisans would lead to stronger overall effects of a partisan cue.

Knowledge Networks Policy Study; The Ambivalent Partisan (Lavine, Johnson, Steenbergenin press) Policy Only Condition: Congress has recently debated two policy measures dealing with benefits to social welfare recipients. The first policy, POLICY 1, calls for $1000 per month for a family of one child, with an additional $200 dollars for each additional child. These benefits are intended to last 7 years. Recipients would also receive $2,000 a year in food stamps and extra subsidies for housing and child care. (Generous) The second policy, POLICY 2, calls for $400 per month for a family of one child, with an additional $50 dollars for one additional child. These benefits are intended to last for 3 years. Recipients would also receive $500 a year in food stamps but no extra subsidies for housing or child care. (Less Generous) Policy + Cue Condition: Democrats and Republicans in Congress have recently debated two policy measures dealing with benefits to social welfare recipients. The first policy, POLICY 1, proposed by Republicans, calls for $1000 per month for a family of one child, with an additional $200 dollars for each additional child. These benefits are intended to last 7 years. Under this Republican plan, recipients would also receive $2,000 a year in food stamps and extra subsidies for housing and child care. Generous The second policy, POLICY 2, proposed by Democrats, calls for $400 per month for a family of one child, with an additional $50 dollars for one additional child. These benefits are intended to last for 3 years. Under this Democratic plan, recipients would also receive $500 a year in food stamps but no extra subsidies for housing or child care. Less Generous

Predicted Marginal Effect of Liberal vs. Conservative Political Orientation on Preference for the More Generous Policy Proposal: Knowledge Networks Panel

Effects of Sample Bias: Example 2 • Policy Support and Emotive Visual Imagery (Huddy & Gunthosdottir, 2000) • Highly involved MOST affected by visual cue • In this example, the goal was to understand the impact on policy support of a positive or negative image of an animal that would be saved by an environmental policy • The effects varied with one’s position on environmental issues and so the findings would be stronger in a sample with a bias towards pro-environment views

Stimulus Materials • The design of this study is a 2 (pro or anti-environment message) times 5 (no animal, cute mammal, ugly mammal, cute insect, ugly insect) between subjects factorial design. • The stimulus material consisted of flyers emulating pro and anti-environment fundraising letters. All flyers, whether pro or anti-environment, were about the same fictitious environmental dilemma, in which mining would assist an impoverished population living in the Guatemalan rainforest but would destroy the habitat of a geographically restricted animal. • The pro-environment flyer argued for the protection of the animal; the anti-environment flyer argued that human needs outweigh environmental concerns. • Both the name of the fictitious animal, Guatemalan Cobyx, and the fictitious organization, Club Berneaud International (CBI), were held constant

Predicted Levels of Action for a Pro-Environment Organization Among Strongest Environment Supporters 9 8 7 6 5 High Involvement 4 Low Involvement 3 2 1 0 No Picture Butterfly Bug Bat Monkey Emotive Image Note: Predicted levels of action calculated at a value of .25 on the pro-environment scale.

iV. Survey mode ERRORS: non-response, non-coverage, and measurement error • 1. Survey Mode Errors Can Conflate Several Sources of Error • In practice mode effects can reflect a different sample population, non-coverage, and non-response errors. • Can eliminate population differences by randomly assigning respondents to mode from within the same population (e.g., SPAR) • Still get large differences in response rate by mode in Sweden; e.g., Kallmen et al.

DIFFERENCES DUE TO Response Rate & survey mode:AUDIT SCORES TO IDENTIFY PROBLEM DRINKING, > 8 for men; > 6 FOR WOMEN (KällménET AL 2011)

2. Mode & Measurement error Origins of measurement differences by mode • Interviewers affect responses (e.g., telephone vs. web), • Get decreased reporting of undesirable attitudes and behavior in personal interiviews (2) Comprehension affected by aural (phone) vs. visual (web) mode • Get visual layout effects, primacy, recency • Typically get a primacy effect on paper, recency on phone • More positive responses to scales on phone (when do not see the scale) (3) Ask different types of questions in different modes. • On the web use different kinds of responses for multiple vs. single responses (not comparable to phone) e.g., checklists and grids • Show cards in personal interviews • Can include longer lists of response options in person, mail, or web

Mode Bias Alters Link Between Gender & # of Sexual Partners, Tourangeau et al 2000

3. Move to mixed mode survey designs (Dillman) • Benefits of Mixed Mode Deisgns: • Lower Cost; Start with least expensive method • Improve Timeliness • In 2003 NSF earned degrees survey, asked which mode best and used it in 2006. Improved response time. • Reduce Coverage Error; • Access to different kinds of people • Easier to Provide Incentives in some Modes • By mail in an initial mailing • Improve RR and Reduce Non-response Error • Do it in sequence • Reduce Measurement Error on sensitive questions • But creates numerous complications for survey experiments

Specialized populations on the web • On occasion, may need to seek out special populations which are readily accessible on the web.

Mediator and Participant Recruitment Det ails, SMIS Studies 1 2 3 4 5 6 Culture Partisan Partisan Campaign Political Blog Political Wars Identity Identity Ads Metaphors A verage Metaphors ( 2006 ) ( 2007 ) ( 2008 ) ( 2007 ) (2007) ( 2008 ) Data Collection 6/6 - 7/31, 5/16 - 6/4, 3/17 - 5/2, 3/10 - 5/5, 6/23 - 7/15, -- 4/15 - 5/13, Dates 2006 2007 2008 2007 2007 2008 3 Mediator Type Blogs/ Blogs/ Blogs/ Blogs/ Blogs/ Blogs/ RAs 1 Forums Forums Forums Forums Forums Forums Mediators 100 100 178 198 50 125.5 4 Contacted Mediators 24 4 23 18 6 15 4 Participated Mediator 24% 4% 13% 9% 12% 12.4% 100% Response Rate 2 Participants (N) 2248 630 3219 1452 297 1569.2 141 Yield: Particip . / 93.7 157.5 140.0 80.7 49.5 104.3 35.3 # Mediators

Specialized populations • S15.3 European MSM Internet Survey (EMIS): differences in sexually transmissible infection testing in European countries • U Marcus1, et al. • Sex Transm Infect 2011;87:A19 doi:10.1136/sextrans-2011-050102.64 • Methods From June through August 2010, the European MSM Internet Survey (EMIS) mobilised more than 180 000 respondents from 38 European countries to complete an online questionnaire in one of 25 languages. The questionnaire covered sexual happiness, HIV and STI-testing and diagnoses, unmet prevention needs, intervention performance, HIV-related stigma and gay-related discrimination. Recruitment was organised predominantly online, through gay social media, and links and banners on more than 100 websites for MSM all over Europe.

References • Druckman, James N. and Cindy D, Kam. 2011. “Students as Experimental Participants: A Defense of the ‘Narrow Data Base.’” In James N. Druckman, Donald P. Green, James H. Kuklinski, and Arthur Lupia, eds., Handbook of Experimental Political Science. • Cassese, Huddy, Hartman, Mason & Weber. 2012. Socially-Mediated Internet Surveys (SMIS): Recruiting Participants for Online Experiments, under review. • Don A. Dillman. 2009. Internet, Mail and Mixed Mode Surveys: The Tailored Design Method. 3rd ed. Hoboken, NJ: Wiley. ISBN: 9780471698685 (cloth) • HåkanKällmén & Kristina Sinadinovic & Anne H. Berman & Peter Wennberg; NORDIC STUDIES ON ALCOHOL AND DRUGS V O L . 28. 2011 • Groves, Robert M. et al. 2009. Survey Methodology. 2nd edition., Hoboken, NJ: John Wiley & Sons. • Hecke & Weise, 2012. In Telephone Surveys in Europe, ed. Häider, Häider, & Kϋhne; Springer, Heidelberg. \

References • Kristina Sinadinovic, Peter Wennberg, Anne H. BermanDrug and Alcohol Dependence, 2011, 114:55-60 • Lavine, Johnson, Steenberge. In press. The Ambivalent Partisan. • Tourangeau, Roger, Lance Rips and Kenneth Rasinski. 2000. The Psychology of Survey Response. New York: Cambridge University Press. ISBN: 0521576296. • Huddy, Leonie and Anna Gunthorsdottir. 2000. The Persuasive Effects of Emotive Visual Imagery: Superficial Manipulation or A Deepening of Conviction? Political Psychology. 21:745-778.

Survey Sampling Issues and Solutions: A Comprehensive Overview

Survey Sampling Issues and Solutions: A Comprehensive Overview

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7