1 / 54

Matt Jans PhD Candidate Michigan Program in Survey Methodology University of Michigan

Can Speech Disfluency and Voice Pitch Predict Item Non-response (and Accuracy) to Income Questions. Matt Jans PhD Candidate Michigan Program in Survey Methodology University of Michigan. Thanks. US Census Bureau Dissertation Fellowship Program Cannell Fund in Survey Methodology

teige
Download Presentation

Matt Jans PhD Candidate Michigan Program in Survey Methodology University of Michigan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Can Speech Disfluency and Voice Pitch Predict Item Non-response (and Accuracy)to Income Questions Matt Jans PhD Candidate Michigan Program in Survey Methodology University of Michigan

  2. Thanks • US Census Bureau Dissertation Fellowship Program • Cannell Fund in Survey Methodology • Frederick G. Conrad and James Lepkowski (co-chairs) • Also Frauke Kreuter, Norbert Schwarz, Jack Fowler, Jose Benki • Thanks to all Tallberg, Sweden, June 16, 2009

  3. Total Survey Error Tallberg, Sweden, June 16, 2009

  4. Total Survey Error Tallberg, Sweden, June 16, 2009

  5. Dual Inference ProcessGroves, Fowler, Couper, Lepkowski, Singer, and Tourangeau, (2004), Survey Methodology Measurement Representation Construct Population Mean Coverage Error Validity Sampling Frame Measurement Sampling Error Measurement Error Sample Nonresponse Error Response Respondents Processing Error Adjustment Error Postsurvey Adjusted Data Edited Data Survey Statistic

  6. Perspectives on Income Data Quality • Most survey ask income • Major demographic variable • Poverty statistics, assets, saving and investment behavior, socio-economic status • More broadly, sensitive and complex questions Tallberg, Sweden, June 16, 2009

  7. Data Quality of Income Reports • High rates of missing data • High inaccuracy • Higher for certain income sources • Missing data appear in all modes but highest in interviewer-administered modes (De Leeuw, 1992) • Missing data is a problem even if MCAR (e.g., smaller sample size, larger SE) Tallberg, Sweden, June 16, 2009

  8. Paradata • Data about the response process, or the data collection process more generally, which may not be routinely collected (Couper, 1998) • Production data • Interviewer response rates, interview durations (Safir, Black, & Steinbach, 2001) • Conversation involved in the Q-A process (Maynard, Houtkoop-Steenstra, Schaeffer, & van der Zouwen, 2002; Schober & Bloom, 2004) • Qualities of voice and speech (Conrad, Schober, & Dijkstra, 2008; Groves et al 2008) Tallberg, Sweden, June 16, 2009

  9. Respondent Paradata • Focus on Respondent Verbal Paradata • Speech Characteristics • Pauses • Fillers • “ummm”, “uhhh” • Repairs • “The data is…I mean, the data are conclusive” • Reports • “I make $15 an hour” or “I do alright” Tallberg, Sweden, June 16, 2009

  10. Respondent Paradata • Focus on Respondent Verbal Paradata • Acoustic Voice Characteristics • Voice pitch and pitch variation (e.g., fundamental frequency, f0) • Also interesting, but not in dissertation • Volume (intensity) • Pleasantness Tallberg, Sweden, June 16, 2009

  11. Survey Methodology Uses Verbal Paradata • Respondent disfluencies suggest cognitive difficulty with questions (Conrad, Schober, & Dijkstra, 2008; Schober & Bloom, 2004; Maynard, Houtkoop-Steenstra, Schaeffer, & van der Zouwen, 2002; Schober & Bloom, 2004) • Interviewer voice qualities correlate with unit response rates (Oksenberg & Cannell, 1988; Groves et al, 2008) • Higher pitch • Pitch variability (standard deviation) • Breathiness Tallberg, Sweden, June 16, 2009

  12. From Survey Methodology to Psycholinguistics and Psychology • Other fields have used verbal paradata (e.g., voice pitch, rate of speech, disfluencies) to study psychological states (Bachorowski, 1999; Banse & Scherer, 1996) • Verbal paradata = Psychological states • Income reporting mechanisms are psychological, so… • …We should take lessons and methods from these other fields Tallberg, Sweden, June 16, 2009

  13. Mechanisms of Poor Income Data Quality • Beatty & Herrmann’s (2002) two-cause model of item nonresponse • Cognitive Difficulty • Question complexity or mapping to respondent’s situation) • Motivation • Sensitivity • May apply to accuracy as well Tallberg, Sweden, June 16, 2009

  14. Paradata are Measures of Underlying Constructs Resp’t Disfluency Resp’t Cog Diff Resp’t Report Income Data Quality Resp’t Voice Pitch Resp’t Affect Coded Resp’t Affect Resp’t Cognitive Problem Resp’t Voice Pitch SD Tallberg, Sweden, June 16, 2009

  15. Research Questions • Do respondents’ verbal paradata predict income nonresponse and inaccuracy? • How well do verbal paradata measures represent constructs we think cause poor income data quality? • Do findings differ between income nonresponse and income inaccuracy? Tallberg, Sweden, June 16, 2009

  16. Study Design • Digital recordings of telephone interviews (Surveys of Consumers, Umich/Reuters) • Selected based on income response • Nonrespondent, Bracketed respondent, Dollar amount • Income question and 4 prior questions • Sensitive and Complex • Sensitive and Not Complex • Not Sensitive and Complex • Not Sensitive and Not Complex Tallberg, Sweden, June 16, 2009

  17. Current Analyzable Sample • 159 Respondents • 795 Questions (5 questions each) • About 8000 I and R Utterances (variable across items) • Utterance-level coding scheme • More than just behavior coding Tallberg, Sweden, June 16, 2009

  18. Coder Reliability (training) • Question-Answer Behaviors • Kappa = .60-.76 • Speech Disfluencies and Reports • .51-.89 • Rating of psychological states • .22-.31 (anxiety), .34-.59 (cognitive difficulty) • Pitch measures mechanically coded, no reliability measure • Retrained and revised coding of affect and cognitive difficulty Tallberg, Sweden, June 16, 2009

  19. Abuse!!! • Only one measure of data quality • Item nonresponse • Complications with validation data • Scarcity of data sets with recordings AND validation • Will improve over time Tallberg, Sweden, June 16, 2009

  20. Predictions • Indicators of affect will be higher on sensitive than non-sensitive questions • Rated affect, voice pitch • Indicators of cognitive difficulty will be higher on cognitive complex questions than non-complex questions • Rated difficulty, pauses, fillers, reports, repairs, answers with qualifications • Expect differences by income refusal type Tallberg, Sweden, June 16, 2009

  21. SCA Analytic Design • Repeated measures ANOVA • Two within-subjects factors • Sensitivity (2) • Complexity (2) • One between-subjects factor • Income respondent type (3) • Income nonresponse • Bracketed response • Income dollar amount Tallberg, Sweden, June 16, 2009

  22. Questions Coded • Currently analyzing Q1-Q4 • First analytic pass, a few more cases coming Tallberg, Sweden, June 16, 2009

  23. Sensitivity and Complexity • More positive affect on noncomplex than complex questions (F(1,156)=7.027, p=.005) Tallberg, Sweden, June 16, 2009

  24. Sensitivity and Complexity • Higher affect intensity on nonsensitive (F(1,156)=150.45, p<.0005) Tallberg, Sweden, June 16, 2009

  25. Sensitivity and Complexity • Higher affect intensity on noncomplex questions (F(1,156)=17.27, p<.0005) Tallberg, Sweden, June 16, 2009

  26. Sensitivity and Complexity • Cognitive Difficulty 3-way interaction (F(2,156) =3.03, p=.051) Tallberg, Sweden, June 16, 2009

  27. Differences by Income Refusal Type • Higher (more positive) affect in income respondents, moderate in bracketed Rs, lowest in income nonrespondents (F(2,156)=14.59, p<.0005) Tallberg, Sweden, June 16, 2009

  28. Differences by Income Refusal Type • Most rated difficulty in bracketed R’s, Least in income NR’s (F(2,156)=20.24, p<.0005) Tallberg, Sweden, June 16, 2009

  29. Differences by Respondent Type • More negative comments from income nonrespondents (F(2,156)=4.32, p=.015) Tallberg, Sweden, June 16, 2009

  30. Unexpected Findings • More backchannels in nonsensitive items (F(1,156)=11.347, p=.001) Tallberg, Sweden, June 16, 2009

  31. Unexpected Findings • More backchannels in cognitively complex items (F(1,156) 3.66, p=.058) Tallberg, Sweden, June 16, 2009

  32. Unexpected Findings • More conversation management in nonsensitive questions (F(1,156) = 6.98, p=.009) Tallberg, Sweden, June 16, 2009

  33. Other Analytic Avenues • Other question characteristics • Open v. Close-ended response • Self v. Other referent • Order in the questionnaire • Interviewer effects Tallberg, Sweden, June 16, 2009

  34. Bigger Picture: Latent Variable Model Resp’t Disfluency λ1,1 Resp’t Cog Diff λ2,1 λ7,1 Resp’t Report Income NR Resp’t Voice Pitch λ4,2 λ8,2 Resp’t Affect λ6,2 Coded Resp’t Affect λ3,1 Resp’t Cognitive Problem Resp’t Voice Pitch SD λ5,2 Tallberg, Sweden, June 16, 2009

  35. Thanks! • mattjans@isr.umich.edu • sitemaker.umich.edu/mattjans Tallberg, Sweden, June 16, 2009

  36. Dissertation Design: Sequence Viewer Tallberg, Sweden, June 16, 2009

  37. Dissertation Design: Praat Tallberg, Sweden, June 16, 2009

  38. S & C Question (Q2) • “During the next year or two, do you expect that your (family) income will go up more than prices will go up, about the same, or less than prices will go up?” Tallberg, Sweden, June 16, 2009

  39. S & NC Question (Q3) • “During the next 12 months, do you expect your (family) income to be higher or lower than during the past year? By about what percent do you expect your (family) income to (increase/decrease) during the next 12 months?” Tallberg, Sweden, June 16, 2009

  40. NS & C Question (Q1) • “What about the outlook for prices over the next 5 to 10 years? Do you think prices will be higher, about the same, or lower, 5 to 10 years from now? Do you mean that prices will go up at the same rate as now, or that prices in general will not go up during the next 5 to 10 years? By about what percent per year do you expect prices to go (up/down) on the average, during the next 5 to 10 years?; How many cents on the dollar per year do you expect prices to go (up/down) on the average, during the next 5 to 10 years” Tallberg, Sweden, June 16, 2009

  41. NS & NC Question (Q4) • “Speaking now of the automobile market - do you think the next 12 months or so will be a good time or a bad time to buy a vehicle, such as a car, pickup, van, or sport utility vehicle? Why do you say so? Are there any other reasons?” Tallberg, Sweden, June 16, 2009

  42. SCA Income Question • “To get a picture of people's financial situation we need to know the general range of income of all people we interview. Now, thinking about (your/your family's) total income from all sources (including your job), how much did (you/your family) receive in [PREV YEAR]? Tallberg, Sweden, June 16, 2009

  43. SCA Income Question • “Income may affect the way people view the economy and that is why we as for the general range of income of everyone we interview. We have some range categories if you prefer.” • IWR WAITS FOR ACCEPTANCE • “Was your income in [PAST YEAR] above fifty-thousand dollars?” Tallberg, Sweden, June 16, 2009

  44. Dissertation Design: HRS • Heath and Retirement Study • Face-to-face and telephone • US population age 65+ • Social Security data available for validation Tallberg, Sweden, June 16, 2009

  45. Dissertation Design: HRS Income Item • Social Security Income from previous month • Open-ended with brackets • Reported amount validated against Social Security records Tallberg, Sweden, June 16, 2009

  46. Dissertation Design: SCA • University of Michigan/Reuters Surveys of Consumers • Telephone, RDD, US general population • Digital recordings of every call • Open-ended and bracketed income item • Relatively high income nonresponse (26% in some months) Tallberg, Sweden, June 16, 2009

  47. Dissertation Design: SCA Income Item • Household income from all sources • Open-ended, with brackets for nonrespondents • Three categories of response: full dollar amount, bracketed amount, nonresponse • “Now, thinking about (your/your family's) total income from all sources (including your job), how much did (you/your family) receive in [PREV YEAR]?” Tallberg, Sweden, June 16, 2009

  48. Dissertation Design: SCA Comparison Items • Additional SCA Items • Four items prior to income • Rated for sensitivity and complexity • Selected to approximate experimental conditions • Most sensitive AND complex • Lease sensitive AND complex • Most sensitive and least complex • Least sensitive and most complex Tallberg, Sweden, June 16, 2009

  49. Utterance-level Coding Scheme • Codes • Question asking/answering behavior • Natural communication behaviors • Speech disfluency, reports • Judgments of psychological states • Team • 10 undergraduates at the University of Michigan • Sequence Viewer software used Tallberg, Sweden, June 16, 2009

  50. Utterance-level Coding Scheme • Speaker • Behavior (answering question, requesting clarification, etc) • Repairs and stammers • Reports • Affect intensity and valence • Cognitive difficulty Tallberg, Sweden, June 16, 2009

More Related