Learn how Statistics Canada evaluates paradata sources to improve data collection in Computer Assisted Personal Interviews (CAPI) for social surveys. Understand the impact on research scope, production, and cost. Explore dimensions of quality, survey productivity indicators, and more.

  1. Assessing Quality of Paradata to Better Understand the Data Collection Process for CAPI Social Surveys François Laflamme Milana Karaganis European Conference on Quality and Methodology in Official Statistics Helsinki, May 2010 Statistics Canada • Statistique Canada

  2. Outline • Introduction • Paradata Sources • CAPI Environment • Quality • 6 dimensions • Paradata Quality • Initial Research on Data Collection Process • Reaching Respondents • Productivity and Cost Relationship • Summary and Next Steps Statistics Canada • Statistique Canada

  3. Introduction • Data collection organization • Statistics Canada has both CATI/CAPI interviewers • Responsible for data collection - no sub-contracting • Face-to-face interviews (CAPI) • About 2-3 concurrent CAPI surveys each month • ~100,000 monthly attempts • January 2009 - 7 CAPI concurrent surveys • Initial research objectives • Assess the quality of paradata and its impact on the scope of possible research and resulting conclusions • Better understand data collection process and practices Statistics Canada • Statistique Canada

  4. Paradata Sources • Attempts and contact information for Computer Assisted Personal Interview (CAPI) surveys • Attempt = visit or call • Administrative and payroll information • Extracted from Statistics Canada paradata database • Historical information since 2003 • Updated on daily basis • Targeted CAPI surveys • Canadian Community Health Survey (CCHS), Survey of Household Spending (SHS), Labour Force Survey (LFS) Statistics Canada • Statistique Canada

  5. CAPI Environment • CAPI interviewers • Work independently from their homes • Assigned a set of cases to complete within a specified period of time • Requested to record all attempts at the time they were made • Most paradata captured automatically, except • Attempts outcome - tel. call/personal visit flag • Pay information (hours worked, km, fees) • Daily transmission of production and cost data • Every day worked Statistics Canada • Statistique Canada

  6. Quality - 6 Dimensions (QAF) • Relevance • Describe collection processes • Meet research needs (e.g. number and scope of research) • Timeliness • Day after paradata received • Accessibility • Owner of information • Sensitive and confidential information controlled (e.g. interviewer ID) • Accuracy • Coherence • Interpretability • Focus on the last 3 dimensions Statistics Canada • Statistique Canada

  7. Paradata Quality • Attempt with long duration - potential outliers • Application left open by interviewers • Cap at 2.5 hours - less than 0.5% of attempts • Short interviews • Less than 0.5%-0.9% of interviews • Pattern of attempts • Lag between logged attempts was in line with the expected time required to move between cases • Under coverage of attempts • Number of attempts recorded by type of cases (i.e. respondents, non-respondents, voids) is comparable to US CAPI survey • Personal visits vs telephone calls (production only) • Proportion of tel. calls vary from 25%-40% depending on the survey • Impact on any type of geographic, productivity or cost analysis • More investigations required Statistics Canada • Statistique Canada

  8. Paradata Quality • Production and Payroll data consistency (at interviewer-day level) • About 70% of records on both files • About 80%-85% on both files and production • Representing over 90% of the system time (production) • Over 90% on both files and payroll • Representing over 95% of payroll hours • Personal/telephone information on production and payroll • About 75% coherence • Traveling code on Payroll data (at interviewer-day level) • In general, about 85%-90% of interviewers reported travel with CAPI interview • Vary by RO Statistics Canada • Statistique Canada

  9. Initial Research on Data Collection Process • Reaching respondents • Contact rate • Best interview time • Contact vs interview • Production and cost relationship • Survey Productivity indicators • Interaction between surveys Statistics Canada • Statistique Canada

  10. Contact Rates for First Attempt • Best time to contact: early evening but… • Consistent with information from CATI social surveys pattern • Surprisingly, the shape of this graph varies by survey and even by survey cycle – not stable • Depending on the interaction between surveys and between telephone and personal attempts? Statistics Canada • Statistique Canada

  11. Interview • When interview are conducted • Peak period: 10:00-11:00, 13:00-15:00, 18:00-20:00 • Very similar by survey and survey-cycle Statistics Canada • Statistique Canada

  12. First Contact and First Appointment versus Interview • ~ 38% respondents reached at the first attempt • ~ 45% respondents reached on the day of the first attempt • ~ 53% respondents required at least one appointment prior to interview • ~ 60% interviews completed within 2 attempts • Note that the distribution of lag of days is much more uniform suggesting that interviewers are likely to distribute appointments during collection period Statistics Canada • Statistique Canada

  13. Relationship between Productionand Cost • Good relationship between production (system time) and payroll hours throughout survey cycle Statistics Canada • Statistique Canada

  14. Survey Productivity Indicators • Daily Productivity Indicators • Productivity ratios are relatively stable during collection period - except at the end; different for CATI surveys • Total System Time / Total Payroll Hours • 20%-30% CAPI vs. 60-70% CATI • Complete Interview System Time / Total System Time • 60-80% CAPI vs. 30%-60% for CATI • These ratios are affected by interview length and response rate Statistics Canada • Statistique Canada

  15. Interaction Between Surveys • The proportion of interviewers that work on more than one survey on a given day varies over time • Proportion affected by interview workload distribution and field collection process and practices • Sample coordination initiative Statistics Canada • Statistique Canada

  16. Summary • CAPI paradata • Good quality but more ‘noise’ than for CATI • Good relationship between production and cost indicators • Effort is evenly distributed throughout the day and collection period - different for CATI surveys • Productivity stable throughout collection period • Interaction between surveys varies over time Next Steps • Continue to assess data limitations and its impact • Interaction between personal and telephone attempts • Include geography workload characteristics • Evaluate new initiatives: sample coordination • Identify ‘viable’ operational efficiency opportunities Statistics Canada • Statistique Canada

  17. For more information, please contact Pour plus d’information, veuillez contacter François Laflamme francois.laflamme@statcan.gc.ca Statistics Canada • Statistique Canada

  18. Average Number of Attempts by Final Status of Cases • Statistics Canada surveys are comparable in terms of number of attempts required to resolved cases – except for LFS • Comparison with US survey suggests no (or low) under coverage in terms of attempts recorded Statistics Canada • Statistique Canada

  19. Distribution of Cases, Respondents, Attempts and System Time by Total Number of Attempts • The proportion of cases and system time for cases that required 6 attempts or more is about the same • The ratio %respondents / %cases is still high for cases with 6 attempts or more - but lower than other type of cases • Very different for CATI surveys Statistics Canada • Statistique Canada

  20. Production and Cost Concepts • Production (System Time) • Complete Interview System Time: System time to complete interviews • Total System Time: Total system time includes all type of attempts • Costs (Payroll Hours) • Direct Collection Payroll Hours: Time charged to conduct direct collection activities (including travel time) • Total Payroll Hours : Total time charged • Includes administration, data transmission time, etc. Statistics Canada • Statistique Canada

  21. Effort and Productivity -by Period of the Day • Effort (system time) is relatively evenly distributed throughout the period of day - no peak in evening • Productivity seems to be stable throughout the day Statistics Canada • Statistique Canada

