1 / 31

The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register

The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register. Yanick Beaucage ICES III June 2007. Overview. Background Automatic Coding Manual Coding Quality Evaluation of Classification Updates Quality Assurance Survey Conclusion. Background.

Download Presentation

The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

  2. Overview • Background • Automatic Coding • Manual Coding • Quality Evaluation of Classification Updates • Quality Assurance Survey • Conclusion

  3. Background • STC’s Business Register Redesign • Improve administrative data link • Improve treatment of births/deaths • Reflect the businesses reality • Give update privileges to a larger set of people • Develop a quality assurance program • Part of the quality assurance program is ensuring good industrial classification

  4. Background • Good industrial classification • Leads to better population identification • Leads to smaller sample size • Leads to reduced collection cost • Leads to better precision • Prevents frustration from respondents (and interviewers)

  5. Background Statistics Canada Business Register

  6. Background Statistics Canada Canada Revenue Agency Business Register

  7. Background Statistics Canada Canada Revenue Agency Business Register Automatic Manual

  8. Background Statistics Canada Canada Revenue Agency Business Register Automatic Updates QE QE Manual

  9. Background Statistics Canada Canada Revenue Agency Business Register Automatic Updates QE QE Manual QAS

  10. Automatic Coding • New businesses apply for a Business Number (BN) (done at Canada Revenue Agency - CRA) • In person, over the phone, over the internet, ... • What is the description of the main Business activity? • Decision tree tool used by CRA • Prompts for details needed for coding • Returns a robot-phrase to Statistics Canada

  11. Automatic Coding • Assign classification based on robot-phrase • Improving decision tree tool and usage • Re-developed on micro (originally mainframe) • Expand use for Web BN application (currently used for phone or in person registration) • Develop questions for all sectors • Currently used for 75% of all industrial sectors • Covers 90% of all descriptions to be coded

  12. Automatic Coding • Automated Character Text Recognition (ACTR) • If description too general Manual coding • Used to assign classification based on descriptions • Reference file (French and English) • Parsing strategy • Word weighting algorithm • Score derived

  13. Automatic Coding • Improving use of ACTR • Improve reference file • Each year new phrases are added • Currently 7 000 phrases • Study score needed for match • Opening the weighting algorithm • Improve parsing rules • Revisit the rules • Create an environment for testing purposes • Evaluate impact of changing input/rules/score

  14. Automatic Coding • 40 000 new businesses a month to code • 45% are coded using robot-phrases • 5% are coded using ACTR • Leaves 20 000 new businesses to code • Need manual coding • Done at Statistics Canada

  15. Manual Coding • Other units to code manually • Survey feedback • New operating entity found when profiling • Tool • Search engine for industrial coding • Improve manual coding • Add on-line ACTR or ACTR results • Add decision tree tool

  16. Manual Coding • New businesses • Goal: code all of them • Reality: do as many as we can • Result: backlog of businesses to code

  17. Manual Coding • New businesses • Goal: code all of them • Reality: do as many as we can • Result: backlog of businesses to code CRA May batch Business Register Automatic Manual Manual Backlog CRA June batch Automatic Manual

  18. Manual Coding • Which units should be coded first? • First in, first out? • Economic activity signal? • Economic activity is determined by administrative data • Both! Select a sample from backlog • Take-all (large economic activity) • Take-some 1 (economic activity / older units) • Take-some 2 (economic activity / newer units) • Take-none (no economic activity )

  19. Manual Coding • Prioritize units to code • Can produce under-coverage estimates of the backlog by industrial sector • Ultimate goal • Improve automatic coding • 80% - 90%? • Code all remaining active units

  20. Quality Evaluation of Classification Updates • Update privileges will be expanded • Subject-matter specialists • Collection personnel • Need to evaluate the quality of updates • Prevent systematic errors • Where to focus training

  21. Quality Evaluation of Classification Updates • Two processes • Notification and sample selection • 1- Notification • Specialist determines set of enterprise to look at • Every update to targeted enterprise is sent to specialist • Agree/Disagree/Do nothing • Make use of expertise of specialist • Specialists keep up-to-date with their frame

  22. Quality Evaluation of Classification Updates • 2- Sample selection and evaluation • Based on industry, source of industry, size and complexity of enterprise • Re-code and compare • Minimize respondent input when re-coding • Using notification and sample • Produce error rate for industrial coding • Target specific problems

  23. Quality Assurance Survey • Goal: assess the quality of classification on the BR on an on-going basis • Assess dead/alive status as well • Point in time surveys done in the past • 1993, 1995, 1997, 2002 • Implement a continuous survey • Produce overall results monthly • Produce detailed results combining 12 months

  24. Quality Assurance Survey • Stratification • Industrial sectors • 2 or 3 size stratum • Have higher sampling fraction for larger size • Recently contacted • Considered to have valid classification • Sample allocation • Target 3.5% standard error for annual industrial classification error rate • 550 units a month

  25. Quality Assurance Survey • Currently doing a pilot test • Monthly estimates produced • Yearly estimates based on weighted average of 12 monthly measures • Weighted average based on 1/12 • Weighted average based on population ratio over the year (Nm/(N1+...+N12))

  26. Quality Assurance Survey • Survey will be used to • Clean-up the register as an independent source • Evaluate industrial in and out-of-scope rate • Evaluate industrial error rate for non-surveyed portion of the register (e.g. small enterprises) • Evaluate death rate in order to adjust sample sizes • Potential use • Evaluate frame quality for new surveys • Clean-up part of the register

  27. Conclusion • Classification is essential to the BR • Redesign provides an opportunity • To improve coding • To standardize tools used for coding • To measure quality of coding adequately • To set-up good practices/good reports • Results • Better quality of business survey frames • More efficient surveys

  28. Yanick Beaucage 613-951-4622 yanick.beaucage@statcan.ca

More Related