1 / 28

New Experiments on the Design of Complex Survey Questions

New Experiments on the Design of Complex Survey Questions. Paul Beatty, National Center for Health Statistics Collaborators: Jack Fowler and Carol Cosenza, Center for Survey Research, University of Massachusetts-Boston.

fisk
Download Presentation

New Experiments on the Design of Complex Survey Questions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Experiments on the Design of Complex Survey Questions Paul Beatty, National Center for Health Statistics Collaborators: Jack Fowler and Carol Cosenza, Center for Survey Research, University of Massachusetts-Boston

  2. Optimal structure and presentation of explanatory material in survey questions • Many survey questions are complex, particularly on behavioral surveys • This complexity is driven by: • The desire for very specific data points • The need to collect data as efficiently as possible (i.e. single questions if possible) • A few common practices: • Presentation of material that follows the question mark • The use of examples to illustrate complex concepts • Detailed wording to capture relatively rare events • What alternatives do we have? Are they better?

  3. Methods • Split ballot experimentation in RDD survey (n=425) • Original questions drawn from federal health surveys; we constructed alternative questions • Do responses differ across versions? • If so, can we judge which distribution is more plausible? • Behavior coding random subset of tape recorded interviews (n=313) • How often were initial responses inadequate? • How often do respondents interrupt the question? • How often did interviewer do something more than just read the question to get a response? • How often did respondents ask for repeat, clarifications, and so on?

  4. Issue #1: Info after the question mark • It is common for questions to apparently end but then add some more material: • In the past 12 months, how many times have you talked to any health professional about your own health?

  5. Issue #1: Info after the question mark • It is common for questions to apparently end but then add some more material: • In the past 12 months, how many times have you talked to any health professional about your own health? Include in-person visits, telephone calls, or times you were a patient in a hospital.

  6. Issue #1: Info after the question mark • It is common for questions to apparently end but then add some more material: • In the past 12 months, how many times have you talked to any health professional about your own health? Include in-person visits, telephone calls, or times you were a patient in a hospital. • Concern: Do respondents pay adequate attention to this material? Failure to consider it could lead to under-reports.

  7. Issue #1: Info after the question mark • It is common for questions to apparently end but then add some more material: • In the past 12 months, how many times have you talked to any health professional about your own health? Include in-person visits, telephone calls, or times you were a patient in a hospital. • Concern: Do respondents pay adequate attention to this material? Failure to consider it could lead to under-reports. • Alternative: • People talk to health professionals in person, over the phone, or as a patient in a hospital. Including any of those, in the past 12 months how many times have you talked to a health professional about your own health?

  8. Results– Experiment 1 V1 V2 Qualifier: (after q) (begin of q) signif Contacts w/health prof in 12 months 6.6 5.9 n.s. (n=214) (n=206) Initial resp inadeq 32.5% 25.5% n.s. Resp req help 20.0% 13.1% p<.1 (n=160) (n=153)

  9. Issue #2: Related experiment– definition after the question mark • Definitions are sometimes presented after the question mark as well. For example: • V1: Have any of your immediate blood relatives ever been told by a doctor that they have diabetes? By "immediate blood relatives", we mean your parents, your children, and your brothers and sisters, whether or not they are still living.

  10. Issue #2: Related experiment– definition after the question mark • Definitions are sometimes presented after the question mark as well. For example: • V1: Have any of your immediate blood relatives ever been told by a doctor that they have diabetes? By "immediate blood relatives", we mean your parents, your children, and your brothers and sisters, whether or not they are still living. • V2: The next question is about immediate blood relatives-by that, we mean your parents, your children, and your brothers and sisters, whether or not they are still living. Have any of your immediate blood relatives ever been told by a doctor that they have diabetes? • If the definition is easier to ignore in V1, respondents might interpret “blood relatives” more broadly than intended, leading to (erroneously) higher reports in V1.

  11. Results– Experiment 2 V1 V2 Definition: (after q) (begin of q) signif Relative w/diabetes 42.6% 34.4% p<.1 (n=209) (n=215) Initial resp inadeq 7.2% 2.5% p<.1 Interrupted 16.5% 0.6% p<.01 Iwer intervention 9.2% 3.1% p<.05 (n=152) (n=159)

  12. Issue #3: Administration of response categories • Conventional wisdom dictates that you administer the question before offering response categories: • V1: The last time you went to see a doctor, which of the following best describes the main reason for your visit? • Medical treatment for a new condition • Follow-up care for an existing condition • Or, a routine checkup • But what if this design encourages respondents to gravitate toward the first seemingly acceptable response rather than considering the whole list?

  13. Issue #3: Administration of response categories • Conventional wisdom dictates that you administer the question before offering response categories: • V1: The last time you went to see a doctor, which of the following best describes the main reason for your visit? • Medical treatment for a new condition • Follow-up care for an existing condition • Or, a routine checkup • But what if this design encourages respondents to gravitate toward the first seemingly acceptable response rather than considering the whole list? • V2: People schedule doctor visits for a variety of reasons, including getting medical treatment for a new condition, follow-up care for an existing condition, or a routine checkup. Which of those best describes the main reason for your visit the last time you went to see a doctor?

  14. Results– Experiment 3 V1 V2 Response categories: (after Q) (before Q) signif New condition 21.5% 23.6% n.s. Follow-up 41.0% 34.6% Routine exam 37.4% 41.9% -------------------- (n=195) (n=191) Initial resp inadeq 10.6% 23.2% p<.01 --------------------- (n=141) (n=142)

  15. Issue #4: Examples vs. definitions to illustrate complex concepts • Complex concepts such as “strenuous activity” are often illustrated through examples: • The next question is about strenuous tasks done around your home. By "strenuous tasks," we mean things like shoveling soil in a garden, chopping wood, major carpentry projects, cleaning the garage, scrubbing floors, or moving furniture. In the past 30 days, on how many days did you do strenuous tasks in or around your home? • Although designed to express a range of possibilities, but we hypothesize that they have the opposite effect, focusing attention on a few specifics that might not be well chosen • We expect that a good definition will create higher reports and be easier to administer • However, previous attempts were not successful, presumably because our definition was too complex

  16. Examples vs. definitions • V1: The next question is about strenuous tasks done around your home. By "strenuous tasks," we mean things like shoveling soil in a garden, chopping wood, major carpentry projects, cleaning the garage, scrubbing floors, or moving furniture. In the past 30 days, on how many days did you do strenuous tasks in or around your home? • V2: The next question is about strenuous tasks done around your home. By "strenuous tasks", we mean any chores or projects that made you feel very tired by the time you finished them. In the past 30 days, on how many days did you do strenuous tasks in or around your home?

  17. Results– Experiment 4 V1 V2 (example) (def) signif Strenuous activ/mo. 4.9 3.9 n.s. Reported “zero times” 29.3% 37.7% p<.1 (n=208) (n=215) Initial resp inadeq 27.0% 25.1% n.s. (n=153) (n=159)

  18. Issue #5: Question wording to capture rare events • One reason questions are very complex is that their authors want to prompt respondents to think of a broadly inclusive range of situations: • In the past 12 months, how many times have you seen or talked on the telephone about your physical or mental health with a family doctor or general practitioner? • The practice has a downside: respondents may lose track of the forest for the trees • Cognitive interview evaluation of the question above suggested that respondents thought it was exclusively about telephone contact with doctors. • If true, the question would generate significant undercounts.

  19. A simplified comparison • “The next question is specifically about primary care doctors….” • V1: In the past 12 months, how many times have you seen or talked on the telephone with a primary care doctor about your health? • V2: In the past 12 months, how many times have you seen or talked with a primary care doctor about your health? • The only difference between these two questions is the inclusion of “on the telephone.”

  20. Results– Experiment 5 V1 V2 (telephone) (no phone) signif Mean contacts 3.4 3.6 n.s. “Zero” responses 24.7% 9.2% p<.01 (n=194) (n=195) Initial resp inadeq 14.9% 21.1% n.s. Resp req help 5.7% 11.3% p<.1 (n=120) (n=121)

  21. Issue #6: Question decomposition • Food consumption example: • “During the last 30 days, how many times did you eat cheese, including cheese as snacks, and cheese in sandwiches, burgers, lasagna, pizza, or casseroles? Do NOT count cream cheese.”

  22. Issue #6: Question decomposition • Food consumption example: • “During the last 30 days, how many times did you eat cheese, including cheese as snacks, and cheese in sandwiches, burgers, lasagna, pizza, or casseroles? Do NOT count cream cheese.” • Clearly a challenging response task in general; we had little confidence in accuracy of reports • Cognitive testing: when probed about details… • “did you include cheese in other dishes/sandwiches/etc? • (If no), “would that have changed your overall answer?” …some participants increased their reports

  23. Question decomposition (2) • Alternative: multiple, response tasks divided into reasonable sub-components: The next questions are about cheese you have eaten in the last 30 days. Please do NOT include any cream cheese you may have eaten. • During the last 30 days, how many times have you eaten cheese on a sandwich, including burgers? • During the last 30 days, how many times have you eaten cheese in lasagna, pizza, casseroles, or mixed in with other dishes? • During the last 30 days, how many times have you eaten cheese as a snack or appetizer?

  24. Results– Experiment 6 (responses) V1 V2 (single) (multi) signif Mean times 13.9 19.0 p<.01 (n=218) (n=228)

  25. Results– Experiment 6 (behavior coding) • The individual “decomposed” questions consistently outperform the single-item on virtually all measures Orig Alt1 Alt2 Alt3 Inadeq init resp 15.9 9.9 8.3 3.1 Probes used 13.7 7.8 6.3 2.1 Req help/repeat 19.1 15.1 3.1 2.1 (all expressed as %; most signif at p<.05)

  26. Some other considerations • Mean time to administer original was 28 seconds; mean for alternative was 51 seconds • If we actually compare amount of probing, inadequate responses, etc. to reach our desired data points (i.e., through three questions) the rates of behavior coding become very similar • For example: 13.5% of original questions were probed; 15.1% of the alternative series was ever probed • Some research suggests that responses to decomposed questions are less accurate (but…) • Next steps: split ballot experiment on various food and exercise questions (global vs. decomposed) with diary validation

  27. Conclusions • Qualifiers and definitions that dangle after the question mark should be avoided– provided there is a reasonable way to do so. • Conventional wisdom about response categories after the question seems to stand. • In spite of our reservations about examples, we have failed to find evidence that they limit frame of reference. They don’t perform wonderfully, but alternatives don’t do better • Details in questions have the potential to distract respondents from overall meaning. Additional words may help a few respondents, but simpler wording may have a more profound impact.

  28. Conclusions (2) • Experiments presented here involve single, interviewer-administered questions. • Complexity can often be reduced by asking multiple, smaller questions. • However, the pressure to ask fewer questions is real. Hopefully these results provide some guidance for how to structure questions given such constraints.

More Related