ESADE Universitat Ramon Llull Barcelona

Making Internet Surveys Representative Willem Saris ESADE Universitat Ramon Llull Barcelona

The fundamental paradigm of survey research • Sampling theory allowed one to estimate the opinion of the population on the basis of a limited number of people • Survey research became a standard procedure to collect information about the opinion in the population. • Probability sampling has been the paradigm of survey research during the last 50 years

The basic advantage of probability sampling • Using probability sampling the uncertainty in the outcomes can be quantified • On can also control this uncertainty by varying the sample size, or by using advanced estimation procedures.

The disturbing reality • Probability samples are complicated by - coverage error, - non-response error - measurement error. • Survey research with this paradigm requires quite some skills:Sudman and Brandburn 1983, Schuman and Presser 1981, Tourangeau 2000, Saris and Gallhofer 2007)

The new paradigms of Web surveys • Couper: Volunteer opt-in panels are most popular. • At different popular websites or via de media people are asked to participate in surveys on the internet. • If a “sufficiently large” group of volunteers has been obtained, samples are drawn from these pools of volunteers which agree with respect to background statistics with the population of interest. • These people are asked by to participate in a specific survey.

Advantages • Companies can provide results in days where previously weeks were required • For a small amount of the costs of probability samples results of even larger samples can be provided.

Disadvantages 1 • There is no statistical basis to generalize from the sample to the population (Horwitz and Thompson, 1952) . • People must have access to the internet (coverage error) Mostly these respondents are very different from the non-respondents (See e.g. Bethlehem, 2005; Lensvelt-Mulders & Lugtig 2006). • The number of people who answer the questions compared to the number selected (a kind of response rate) is relatively low (Lozar Manfreda, Bosnjak, Haas, & Vehovar, 2005)

Disadvantages 2 • Often not sufficient care is given to the formulation of the questions. This means that very different results are possible (Dillman, 2005). • There is no interviewer that can help with the difficult questions. • No consistency checking: people simply quit the interview when confronted with questions about inconsistencies (Dillman, 2005)

Complete rejection • Billiet has warned in several publications (Billiet 2004, Abt et al. 2005) for these new surveys • He even blamed his colleagues for recognizing these methods by participating in the presentation of results of such surveys.

Statistical adjustments • Harris International has applied a statistical approach – weighting by use of propensity scores - to adjust the voluntary sample to a probability sample (Terhanian , 2001a and 2001b). • Rand Corporation have studied the possibility of weighting web surveys (Schonlau et al 2002 and 2004) . • Other institution in the US (see for example Lee 2006)

European Statisticians • Statisticians in Europe have also started to look at these approaches (Varedian and Forsman, 2003; Forsman and Isaksson 2003; Danielsson 2004; Isaksson, Danielsson and Forsman 2004). • Comparisons of results of access panels and probability samples have been done (see for example Schoen 2004, Faas and Schoen 2006, Oberski 2006, 2007).

Joint research • Joint research has been done by researchers of the old and new approach (Schonlau, Van Soest, Kapteyn, Couper and Winter, 2004)

Results of comparisons • Some of these studies showed that the results of access panels were not significantly different from the results of probability samples, mostly after some correction by weighting • Others showed differences • But yet we do not know when which results will occur.

A Dutch study of web surveys • Zembla in 2006 asked about the use a computer program (Stemwijzer) via an acces panel • The estimate of the use of the program was as far off as 34%. • Only 20% of the people used the program while the access panel suggested 54%.

Further research • The TV program asked three different companies to ask their panels whether they agreed or disagreed with the following three statements

Results for the three questions. • “The program gives an advice which is taken too seriously by many people” The results varied between 42% and 52% • “The program should emphasize more that it only gives an advice” The results varied between 57% and 73 % • “The program has too much influence on the elections” The results varied between 19% and 44%.

Conclusions • The differences can be rather large • Differences will occur when there is a relationship between the reasons for participation and the variable of interest • The problem is that we do not know when this is the case. • Our knowledge about the reasons for participation in such access panels is still rather limited.

What we know about participation 1 • People without internet access can not participate • Less participation can be expected of - older people, - people from non-western countries, - people who are less involved in society, - with less interest in the topic (Faas and Schoen 2006) - less politically interested (Vehovar 2002 and Bosnjak 2002)

What we know about participation 2 • People who participate more are people who do it for money • In the Netherlands: 20% of the total number of people participating in the panels answers 80% of the questions.

Alternative 1: Web panels • Couper (2000) suggests that the best possible option is to use web surveys based on probability samples providing equipment to the households if necessary • Procedure was developed already in 1986 under the name Telepanel • Now available at Centerdata and Knowledge net.

Advantages and disadvantages • A probability sample is used • A lot of information is available about the respondents • It is a lot of work to manage such a panel • The response rate is much lower than in cross sectional research

Possible corrections • A lot of information is available • But not only correction for background variables is needed • Also correction for variables related with nonparticipation: Political interest (Voogt) • However the reasons for nonparticipation are different in different countries (ESS)

Alternative 2: Mixed mode data collection • Draw a probability sample • Ask potential respondents with internet to fill in a web survey and others a mail questionaire • The people who do not reply are contacted by telephone and asked to participate by telephone or face to face interview • Eventually ask them to answer only some central questions

Advantages • Probability sample so generalization possible • Higher response rates up to 90% are possible • Sufficient information available so that weighting on central questions will adjust the estimates nearly perfectly (Voogt 2004)

Disadvantages • If the data collection process takes too long mode effects can be expected • For sensitive or complex questions also mode effects can be expected

Conclusions • The scientific basis for generalization in Volunteered Opt-in panels is very questionable • One mode cross sections and Web panels are plagued by low response rates • Cross sectional research of web users may be possible in the future • At this moment mixed mode data collection maybe a solution for simple nonsensitive topics

ESADE Universitat Ramon Llull Barcelona