560 likes | 718 Views
Creating a Successful Survey. Anne Ryan Faculty Collaborator for LISA Visiting Assistant Professor Department of Statistics, VT Laboratory for Interdisciplinary Statistical Analysis. Marcos Carzolio Associate Collaborator for LISA Graduate Student
E N D
Creating a Successful Survey Anne Ryan Faculty Collaborator for LISA Visiting Assistant Professor Department of Statistics, VT Laboratory for Interdisciplinary Statistical Analysis Marcos Carzolio Associate Collaborator for LISA Graduate Student Department of Statistics, VT
Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...) Our goal is to improve the quality of research and the use of statistics at Virginia Tech. www.lisa.stat.vt.edu www.lisa.stat.vt.edu
How can LISA help? • Formulate research question. • Screen data for integrity and unusual observations. • Implement graphical techniques to showcase the data – what is the story? • Develop and implement an analysis plan to address research question. • Help interpret results. • Communicate! Help with writing the report or giving the talk. • Identify future research directions.
Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use ofStatistics Designing Experiments • Analyzing Data • Interpreting ResultsGrant Proposals • Using Software (R, SAS, JMP, Minitab...) Collaboration From our website request a meeting for personalized statistical advice Great advice right now:Meet with LISA before collecting your data Walk-In Consulting Monday—Friday 1-3 pm GLC Video Conf. Room Mondays and Fridays 3-5 pm in 312 Sandy Hall Tuesdays and Wednesdays 11-1 pm in Port Thursdays 9:30-11:30 am ICTAS Café X for questions requiring <30 mins Short Courses Designed to help graduate students apply statistics in their research All services are FREE for VT researchers. www.lisa.stat.vt.edu
3 Stages of Statistical Thinking • Design – How do we obtain the data? • Description – How do we summarize the data? • Statistical Summaries • Graphical Summaries • Inference – How do we make decisions/predictions based on data?
Outline: Elements of Survey Design • Clearly Define Research Objectives • Define Population to Be Sampled • Develop Sampling Plan • Data Collection Options • Errors with Surveys • Questionnaire Design • Pretest • Histograms, boxplots, and Scatterplots • Factor Analysis
Clearly Define Research Objectives • State CLEARLY and CONCISELY your • Overall Research Goals • Specific Scientific Questions • Refer to these objectives constantly throughout the design of your survey to ensure your survey is answering the desired questions of interest.
Define Population to Be Sampled Who will you interview to answer your research questions? The overall group of interest or the target group is the population. • Subject: Any material we measure. • Plant, Person, Piano etc. • Population: representation of all the possible outcomes or measurements of interest. • Sample: Subset of the population to be measured (i.e. group of subjects that represent the population).
Sampling Plan • Once the target population has been identified, next the sampling plan must be devised. • Goal: Randomly select a small percent of the population that will in turn represent the ideas of the population as a whole. • The sampling plan involves: • The technique used to select the subjects for your study. • Simple Random Sampling • Stratified Random Sampling • Cluster Sampling • Systematic Sample • The number of people needed for your study. • Sample size calculations.
Simple Random Sampling • Subjects chosen by random mechanism. • Each subject has an equal chance of begin part of the study. • Easiest to summarize BUT most tedious to implement in the field. Example: Randomly select 10 students from the Stat 3005 class roster to ask a question.
Stratified Random Sampling • First divide population into strata (Groups) based on similarity • Then randomly select subjects within each strata. • Easier to implement. • May result in more precise summary. Example: Randomly select 5 male students and randomly select 5 female students from the STAT 5615 class roster to ask a question.
Cluster Sampling • Population has many clusters. • First randomly select a number of clusters. • Then sample all the units within each cluster. • Require clusters to be representatives of population. Example Population: opinions of all students (attending class) at VT 1) Randomly select a certain number of classes 2) ask all students in each class their opinion Note: Cluster sampling is often NOT as efficient as stratified sampling for surveying.
Systematic Sampling Example: Telemarketers randomly sample every 10th phone number on the Yellow Book to make marketing calls.
Determine the sampling technique for the following situations: • You are studying sleeping patterns among freshmen, sophomores, juniors, and seniors at Virginia Tech. You group the students based on grade level and then take a simple random sample of 10 students from each grade level. • Stratified Sampling • You are studying sleeping patterns at Virginia Tech. From the registrar you obtain a master list of students at Virginia Tech. You then randomly select 5,000 students to survey about their sleeping habits. • Simple Random Sample
A light bulb manufacturer produces approximately 100,000 light bulbs per day. The quality control department must monitor the defect rate of the bulbs. Testing each bulb would be costly and inefficient, so department decides to test every 100th bulb produced. • Systematic Sampling • You are studying the sleeping patterns of college students. From a list of all the colleges and universities across the country, you perform a simple random sample to select 10 colleges/universities. Then you measure every student attending the 10 colleges/universities. • Cluster Sampling
Sample Size Calculation • How many people do we interview? • Answer: It depends. • Sample size calculations can be computed using statistical methods. (Come to LISA we can help!) • Sample size calculations also involve characteristics of the study: • Time, money, precision. • For many Gallup polls, the population of interest is all adult Americans. To represent this population, the sample usually consists of around 1,000 adults. • When sample sizes get to sizes around 500 or more the gains in accuracy get smaller and smaller for the increase in sample size.
Data Collection Options • Once we know the subjects we want to survey, we must determine the best instrument for collecting data. • Data Collection Options: • Personal Interviews • Telephone Interviews • Mail Surveys • Email Surveys • For more discussion of data collection options see http://www.surveysystem.com/sdesign.htm.
Personal Interviews • A face-to-face encounter between the interviewer and the subject. • Advantages: • People usually respond when confronted face-to-face • Can get a better sense of the reaction of the subject • Prevent misunderstandings • Disadvantages: • More Costly • Interviewers who are not trained properly may introduce bias into the sample.
Telephone Interviews • Most popular instrument for survey in the United States since 96% of homes have telephones. • Personal Interviews and telephone interviews are usually the most successful forms of surveying with response rates around 60 to 75%. • Advantages: • Less expensive than personal interviewing • Random phone numbers can be dialed • Fast results • Disadvantages: • People are reluctant to answer phone interviews • Phone calls can usually only be made from around 6pm-9pm • Phone surveys normally need to be shorter in length than personal interviews
Mail Surveys • Advantages: • Cheap • Questionnaire can include pictures • People are able to answer on their own time • Disadvantages: • Timely processes • Response rates have a tendency to be low
Email Surveys • Advantages • Cheap • Fast • You can attach pictures or sound files • Disadvantages • People may respond multiple times • People who have email may not be representative of the population as a whole
Nonresponse Bias In a national sample of board-certified physicians, a short survey was mailed asking physicians to nominate the five best hospitals in their specialty regardless of cost or location. Up to three follow-ups were mailed to nonresponders to gain participation. The final response rate was 47.3%. Males were significantly more likely to respond than females, which would not be an issue if men and women answered in the same way… But, men were significantly more likely to nominate one or two top hospitals in their specialty. In addition, women were significantly more likely to nominate hospitals only in their region.
Nonresponse Bias Definition: Survey error that happens when respondents are different from nonrespondents in a significant way Problems: Filters out certain types of respondents The reason for which a person responds (or, conversely, does not respond) to a survey is related to the subject of the survey Possible Solutions: Provide incentives for completing survey Explain why survey is important Keep survey short and sweet Give more weight to answers from hard-to-reach respondents (Come to LISA)
Measurement Error In a study about measurement error in earnings data, respondents were asked to report their annual wages. The reported wages were then compared to earnings statements on detailed W-2 records. Not surprisingly, the study found that respondents tended to over-report their wages when compared to their W-2 records. Also, the discrepancy between reported and official wages decreased as official wage increased.
Measurement Error Definition: Inaccurate answers to survey questions (sometimes due to lack of clarity in writing) Problems: Makes it difficult to judge if answers are accurate May lead to incorrect conclusions about target population Possible Solutions: Write clear, concise questions Be aware of leading questions Be aware of social factors that may influence responses Explain why survey is important
Coverage Error Definition: Not all members of a population have a known, nonzero chance of being selected for survey Problem: Survey may turn out to be biased Possible Solutions: Identify target population (might require some expertise in the subject of the survey) Construct a sampling frame - a list of all possible respondents Avoid: duplicates; respondents that are outside of target population; and excluding a portion of target population Randomize
Sampling Error Definition: Inherent inaccuracy due to one’s inability to sample entire population Problem: Variability among individual respondents makes it difficult to learn about group as a whole Possible Solutions: Find right sample size (Come to LISA) Know difference between sample and population
Questionnaire Design • Our goal of this section is to comment on some of the important aspects of questionnaire design. • An article appearing in the International Journal of Market Research gives great advise about questionnaire design. This youtube video summarizes the findings in the article http://www.youtube.com/watch?v=53mASVzGRF4. • We will discuss the following topics associated with questionnaire design. This list of topics is not comprehensive, so we suggest that you explore the topic of questionnaire design further. • Length • Question Ordering • Don’t Know Option • Open versus Closed Questions • Wording • Scaling Questions
Length of Questionnaire • Keep the questionnaire as short as possible. • The Creative Research Systems has the following useful suggestions. (http://www.surveysystem.com/sdesign.htm) • Follow the “KISS” method meaning “Keep it short and simple!” • Categorize questions into 3 groups: • Must Know • Useful to Know • Nice to Know • If the questionnaire seems too long, start omitting the “nice to know” questions. • Don’t get caught in the trap where you find that you have a captive audience, so you begin asking questions that are not pertinent.
Question Order Effects Priming Early questions refresh respondents’ memory for subsequent questions Carryover Respondents believe questions are similar and answer them with same criteria Consistency Respondents answer questions similarly to try to appear consistent Norm of Evenhandedness Respondents answer questions similarly to try to be fair Anchoring Early questions set a standard for comparison to later questions Subtraction Considerations in answers to early questions are left out of subsequent judgments Avoiding Extremeness Respondents try to seem neutral by choosing some items while rejecting others
Priming An NIH Survey on Disability asked respondents to list causes of their disabilities. Nearly 49% of respondents who were previously asked about sensory impairments reported those as the causes for their disability, while only 41% of those who had not previously been asked about sensory impairments reported the same causes.
Carryover • General questions should proceed specific questions. • A study was conducted in 1979 to determine a person’s overall happiness and a person’s happiness in their marriage. • Possible ordering for questions: • General happiness question first followed by specific question concerning happiness in marriage. • Specific question concerning happiness in marriage first followed by general happiness question. • Results: Over 60% of respondents indicated that they were very happy in their marriage. • General Happiness Question followed by specific marriage happiness question-52% responded they were very happy. • Specific marriage happiness question followed by general happiness qeustion-38% responded they were very happy. • Overall respondents were happier with their marriage than life in general. • The marriage question first caused people to rank their level of overall happiness lower.
Consistency Three questionnaires about criminals were administered to students, where one was strongly worded against criminals, another was biased toward leniency for criminals, and the third was constructed to be neutral. Afterwards, the students were asked to complete scales measuring their opinions about criminals. Student responses tended to reflect a similar level of leniency to the questionnaire they answered beforehand.
Norm of Evenhandedness Students at Washington State University were asked about the consequences of plagiarism. Two questions in particular were given: “Should a student who plagiarizes be expelled?” and “Should a professor who plagiarizes be fired?” When the professor question was asked first, 34% of respondents indicated on the student question that students should be expelled. But when the professor question was asked second, only 21% indicated that students should be expelled.
Anchoring In 1997, a Gallup poll asked respondents “Do you generally think Bill Clinton is honest and trustworthy?” and “Do you generally think Al Gore is honest and trustworthy?” in different orders. When the Bill Clinton question was asked first, 50% stated that he was honest, then 60% answered that Gore was honest. But when the Gore question was asked first, 68% answered that he was honest, then 57% responded that Clinton was honest.
Subtraction In 1994, a survey asked responents how they would describe the economic situation of their communities over the next 5 years and how they felt about the economic situation in their state over the next 5 years. The survey found that 7-10% more people responded that the state economy would get better when the state economy question was asked before the community economy question. The conclusion of the study was that people tend to remove considerations from subsequent questions after they have been used in previous questions.
Avoiding Extremeness Students were presented a survey about the controversial topics of euthanasia and reduced training for doctors. Then half of them were told they would interact with another student about the topics face-to-face, while the other half were told they would listen to a recording of another student talking about the subject. Before they would proceed, however, they were given more questions relating to the topics. Students who were told they would interact face-to-face with other students answered more moderately than the students who were told they would only listen to a recording. In general, people tend to be more moderate in social settings.
Question Order Group related questions together Choose first question carefully. The first question should: Apply to everyone Be easy to read Be interesting Place sensitive questions near the end Give respondents a chance to become comfortable with questionnaire Ask about sequential events in the order that they occurred Avoid unintended question order effects
Question Order • The following demographic questions should be saved for the end of the questionnaire. • Age, Education, income, martial status, etc. • Ensures that respondents will not feel that they are losing their anonymity when answering the rest of the questions. • Choose the most important questions for your survey to be asked at the beginning of the survey.
Don’t Know Option • Add a “don’t know” or “not applicable” option to all questions unless you are positive that every respondent will have an answer or will feel comfortable answering the question. • Do not want people to feel as though they are being forced to give an answer. • An alternative to the “don’t know” option • Create screening questions before the actual question to determine if the respondent has the knowledge to answer the question. • If it is determined the respondent has the background knowledge the question is given without a “don’t know” option. • If the respondent does not have the background knowledge, then the client may skip that question completely.
Open versus Closed Questions • Open questions allow the respondent to freely answer the question. • Less restrictions and allows for more depth in the overall answer. • Closed questions force the respondent to answer the question by choosing from predetermined choices. • Advantage: Ease in analysis. • One suggestion is to test the survey on a small group with an open question. From those responses form a closed question that encompasses the categories expressed in the responses to the open questions. • Allow for an “other” option in closed questions, to permit respondents to write their own responses.
Wording • Students should not be required to attend weekly colloquium. • Agree • Disagree • Refrain from having two concepts embedded in one question. • Example: “Do you have time to read the newspaper every day?” • Notice you are asking about “time” and “reading the newspaper every day”. • A better revision, “Do you read the newspaper every day?” • If the answer is no, you can create a question to determine the reasons the person does not read the newspaper.* • Refrain from negatively worded questions. • Example: • Example: Question: What is your view on the concept that students should not be unhelpful with recruiting new graduate students to the statistics department. Revision: What is your view on the concept that students should be helpful with recruiting new graduate students to the statistics department.
Scaling Questions • A popular technique in survey design is the use of scaling questions. • Respondents are able to select a number or category that represents their answer to the survey question. • Likert scaling is common technique used in questionnaires. • A Likert item is question or statement on a questionnaire where the respondent gives a rating for their response on a topic. • The rating is usually the level of agreement the respondent has concerning the statement or question. • A likert item is balanced, meaning there is an equal number of positive and negative positions. http://en.wikipedia.org/wiki/File:Example_Likert_Scale.jpg
Scaling Questions v • Research reports that 5-point and 7-point scale responses are the most common. • The inclusion of the middle option increases the validity and reliability of a response scale slightly. • Example: • Likert items can be analyzed separately or the items may be summed and the sum can be analyzed. The sum of Likert items is called the Likert Scale. Disagree Slightly Disagree Neither agree or disagree Slightly Agree Agree
Pretest • A pretest of the survey to a smaller sample is suggested if possible. • This pretest can • Allow you to revise the questionnaire if needed. • Allow you to create a closed question from the responses for an open question. • Help you estimate the variability in the responses to your questions.
Sample Size Calculation for a Proportion Let n = sample size σ = standard deviation d = confidence interval size α = significance level Then, to get a (1-α/2)*100% confidence interval, we need a sample size of:
Sample Size Calculationfor a Proportion For example, suppose we want an estimate for a 95% confidence interval of width 0.2 (meaning we have a 0.1 margin of error). If we know from a pilot study that the standard deviation of the population is 0.5, then, σ = 0.5 d = 0.2 α = 0.05 And plugging these numbers into the previous equation, we get, n = 96.03 Which means we need to sample 97 people.
Mozambican Survey • Asked people living in villages to rate how “painful” a task it was to fetch water on a 6-point Likert scale (ranging from 1- not painful at all, to 6- extremely painful) • Question was given to households in villages both with and without a water pump • Some households, especially those without water pumps, must travel hours per day to fetch water • How can we best depict the resulting data? • Histograms • Box-and-whisker plots