290 likes | 433 Views
Using Paradata to Monitor and Improve the Collection Process in Annual Business Surveys. By Sylvie DeBlois, Statistics Canada Rose-Carline Evra, Statistics Canada ICES-III, Montreal, June 19 th , 2007. OUTLINE. Introduction Score Function Paradata Score Function Recent Update
E N D
Using Paradata to Monitor and Improve the Collection Process in Annual Business Surveys By Sylvie DeBlois, Statistics Canada Rose-Carline Evra, Statistics Canada ICES-III, Montreal, June 19th, 2007
OUTLINE • Introduction • Score Function • Paradata • Score Function Recent Update • Future Developments
Introduction • The Unified Enterprise Survey (UES) is an annual economic survey on financial and characteristic variables, which has been conducted by Statistics Canada since 1998. It combines many surveys. • Average collection period: February to early October • Collection Processing System: Blaise • More than 48,000 questionnaires each year.
UES Questionnaire • UES includes Services, Trades, Manufactures, Agriculture (aquaculture) and Transportation (couriers and taxi & limousine) surveys. • A questionnaire has about 7 to 10 sections (the number of sections varies depending on the survey): • Introduction (Stats Act - Confidentiality, Respondent info) • Revenue • Expenses • Events that may have affected business units • … • Comments
Introduction • Collection Process: • Mail-out of questionnaires • Follow-up in case of non-response for some units / Mail-back of questionnaires • Verification of received questionnaires / Edits • Coding of questionnaires • Imaging & Data Capture • Sometimes during the collection period, follow-ups are required due to non-response. The score function is used to determine the priority of an enterprise in follow-up.
Introduction • Collection follow-up tool: Score function (SF) • Annual Survey of Manufactures (ASM) score function • Non-ASM score function • Both score functions have their own ways of calculating scores, defining cells and priorities. • This presentation will focus mainly on the Non-ASM score function.
Score Function • Reduces collection costs yet retains data quality. • Similar to the collection goal of obtaining a high weighted coverage response rate. • PRIORITY 1:Extensive follow-up for the larger revenue Collection Entities (CE) in cases of non-response. • PRIORITY 0:Minimum follow-up for the smaller CE’s in cases of non-response.
Useful definitions Cell Sampling Unit(part of the enterprise within the cell) Establishment NAICS:North American Industry Classification System (5-digit number) NAICS = YYYYY PROV = AA A B C D E
Method: Initial Scores • Within each cell, calculate the score for each UES sampling unit (SU). • Score = the sample weighted revenue of the SU as a percentage of the cell’s total revenue. • Sample weight: UES sampling weight • Revenue: Sampling Revenue
Method: Initial Scores • Cell: • For Distributive Trades & Aquaculture: NAICS * Province • For Transportation: NAICS*Prov*Stratum(Take All /Take Some) • For Services: NAICS*Prov*Stratum(TA /TS)* Type of questionnaire (long / characteristic)
Method: Initial Scores • Within each cell • Sort SUs by descending score • Cumulate to the survey’s target coverage threshold for the Priority=1s, and the rest are Priority=0s.
Method: Dynamic Scores • During collection process,twice a week, we: • receive updated response codes; • recalculate the scores within the cell (i.e. make it dynamic) to update priorities; • update priorities on Blaise, the collection tool.
Method: Dynamic Scores • As collection proceeds: • Response (received or completed) questionnaires contribute to the cell threshold • Non-response questionnaires contribute nothing to the threshold • Out-of-scope are removed entirely from the cell (reduces the cell’s revenue total) • In-Progress questionnaires are still being collected (include appointments)
During Collection • New total weighted revenue for the CELL (exclude the OOS). • Priority 1’s or 0’s received or completed contribute to reaching the CELL threshold. CELL: XXXXXXXX Total: 475,000k Received or Completed 15% reached Priority 1 In progress 50% left to do Threshold= 65%(308,750k) In progress Priority 0 NON-RESPONSE OOS 50,000k
Method: Dynamic Scores • Has the cell reached its threshold? • If yes, stop follow-up. • If no, recalculate scores using In-progress units and the remaining threshold. • Some cells must close due to lack of In-Progress questionnaires • Some In-progress Priority 0s may be promoted to Priority 1s.
Paradata • Definition: All variables directly related to data collection process • Currently used: • Response code • Appointment reason (edit – data collection) • Appointment date (recently added) • Currently used only by Annual Survey of Manufactures (ASM): • Number of attempts, commodity revenue and shipment revenue • Could possibly be used: • Type of contact with the respondent • Previous year’s response code • Type of reminder sent / Date / # (mail, remail,…) • Others
Score Function Recent Update • Recently, a study was done on the impact of appointments on the response rate (for reference year 2003). • Following our findings the “appointment date” was added as paradata into the score function.
Appointments: The Study • During the collection period, an appointment might be scheduled with the respondent. • “Does the fact of having a appointment affect the response rate?” • Note: When an appointment is made and it’s a priority 1 questionnaire, it remains in the SF with a priority 1 with the “still in progress status”. Therefore, no priority 0 will be put as priority 1.
Response Rates: app versus no app • The response rate is significantly lower for the questionnaires with an appointment. RY2003 (Non-ASM surveys)
Response Rates: Scheduling of the appointment • The response rate is significantly lower for questionnaires when the appointment is made toward the end of the collection period.
Other Facts • The longer a questionnaire stays in appointment, the greater is the probability of that questionnaire being a non-response at the end of the collection period. • 23.8% of the questionnaires with appointments were classified as non-respondent, because at the end of the collection period their cases were still open.
Appointment: Conclusion • When possible, we should avoid making an appointment. Especially, at the end of the collection period. • In cases of appointments, follow-up should occur soon after the appointment is made. An appointment is still a good way of improving the response rates. • The treatment of the appointments in the score function should be modified. Extra “In progress” units will be promoted to priority 1 in order to compensate for possible non-response.
Facts / Findings • A unit may not have an appointment date or may have one that is constantly changing. • Many appointment dates are within a few weeks. • It was decided to only consider units that have a late appointment date, and there are not many.
Facts / Findings • An appointment can mean many things. • Many unexpected factors caused the changes to be less efficient than initially expected.
Human Errors • The interviewer: • Enters the wrong value for a variable (for example, appointment reason) • Does not update a key variable (for example, appointment date)
System Problems • System Failures • As a result, some variables are affected, like the number of attempts. • Files not properly loaded • Missing values or variables • Some follow-up events occur outside of the system
Theoretical / Practical • Appointment date is also used to set the “remail” (remail of questionnaire) and fax date. • Also, some appointment dates are default dates (differ from survey to survey). • Appointment is also used as a reminder to the interviewer to call a respondent unavailable at the moment of the initial call.
Future Developments • Establish what is really an appointment; do more studies on the appointments. • Study more paradata to “quantify” the importance of each unit, give priority and improve the score function. • Introduction of a cost function to help assign the priority and the type of follow-up. • Combine the ASM score function and the Non-ASM score function.
Thank You / Merci!!!Questions ??? Pour plus d’information veuillez contacter / For more information, please contact: Rose.Evra@statcan.ca ou / or Sylvie.DeBlois@statcan.ca