1 / 25

Implementing Coding Tools for a New Classification

Operation 2007 - The players:. In the UK: The Standard Industrial Classification of Economic Activities (SIC) (current version SIC (2003) In Europe: NACE, the Nomenclature gnrale des activits conomiques dans les Communauts europens (current version NACE Rev 1.1) In th

aren
Download Presentation

Implementing Coding Tools for a New Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Implementing Coding Tools for a New Classification John Perry, UK Office for National Statistics

    2. Operation 2007 - The players: In the UK: The Standard Industrial Classification of Economic Activities (SIC) (current version SIC (2003) In Europe: NACE, the Nomenclature générale des activités économiques dans les Communautés européens (current version NACE Rev 1.1) In the UN: ISIC, the International Standard Industrial Classification of all Economic Activities (current version ISIC Rev 3.1)

    3. The UK SIC is a 5 digit classification system is required, by EU legislation, to be identical to NACE down to and including the 4 digit Class level contains a national 5th digit level which does not exist in NACE

    4. The Results – changes in structure

    5. ACTR as an aid to coding ACTR – Automatic Coding by Text Recognition Developed by Statistics Canada ONS standard tool for coding, initially industry and occupation Replaces Precision Data Coder for industry coding Determines a code from a text description Extent of automation of process is controlled by parameters

    6. Knowledge Bases – SIC2003 ACTR relies heavily on indexes of standard descriptions: Business descriptions from responses to the Business Register Survey Published index for the SIC2003 The short descriptions for each SIC2003 code Standard descriptions for construction industry statistics Trade code descriptions for PAYE (Pay As You Earn Tax) employers Farm type descriptions With a total of > 30,000 standard descriptions

    7. How ACTR works Each input description is converted to a standard form This is compared with the standard forms of descriptions held in the knowledge base The closeness is presented as a score between 0 and 10 The system has rules to determine whether the score is sufficient to confirm a match: Requires a score of more than 7.5 to code automatically (our setting which may differ for other data sets) Lower scores are passed through interactive coding Coding does not depend on the order in which the knowledge bases are checked

    8. Extract from Business Register Survey Questionnaire

    12. ACTR Process Supplied text: Horticultural services HORTICULTURAL SERVICE Best fit index entry: Sales and service of horticultural machinery HORTICULTURAL MACHINERY SALE SERVICE Score is 6.911 (out of 10) ACTR prefers SIC 2003 code: 51880 (Wholesale of agricultural machinery and accessories)

    14. Interactive coding Scores below 7.5 are passed to clerical staff for coding interactively The system presents options in descending order of score If none of the choices appear good, staff modify the description Once a decision is made, the person coding confirms the choice The index description is then held on the IDBR.

    15. Introducing the SIC2007 (NACE Rev 2) New index files: SIC2007 headings SIC2007 index Initially code forward from the SIC2003 using bridging codes – these are codes for each knowledge base entry that link the SIC2003 and SIC2007 Later will change to code backwards from the SIC2007 Eventually dual coding will cease

    16. Impact of ACTR on IDBR at Micro Level Existing SIC 2003 is 01120 (Growing of vegetables etc) The preferred ACTR SIC 2003 is 51880 (Wholesale of agricultural machinery and accessories) The SIC 2007 comes from the bridging code SIC 2003: 51880 Bridging code: MTOLR SIC 2007: 46610 SIC 2003 code will change but only when agreed

    17. Conversion to SIC2007 ACTR will deal with units that have a suitable business description Conversion tables will deal with: Units with descriptions that ACTR is unable to code (vague descriptions) Units without a description Units supplied through administrative sources (existing VAT traders, PAYE employers, Registered Companies)

    18. Creation of Conversion Tables Tables have been created to convert units from SIC2003 to SIC2007: Using ACTR bridging codes Coding existing data through ACTR Producing cross-tabulation of SIC2003 to SIC2007 Allocating on a probability basis rounded to nearest 5% Validate relationships against the acceptable range of industries Best fit tables also produced for users who cannot accommodate probability based conversion

    19. Coding process

    20. Impact on the IDBR at the Macro Level Impact on SIC 2003 is only on those reporting units that have business descriptions for local units, where ACTR can code. ACTR codes 620,000 ACTR does not code 210,000 No business description 340,000 Administrative data only 1,660,000 Total local units 2,830,000 SIC 2007 comes from the bridging codes only where ACTR codes – otherwise SIC 2007 comes from conversion from SIC 2003

    22. Impact at SIC 2003 broad industry level (provisional counts)

    24. Correspondence between SIC 2003 and SIC 2007 for local units coded by ACTR

    25. Implementation timetable

    26. Conclusions The ACTR tool delivers considerable savings in terms of cost and burden on businesses compared to traditional survey approaches. The knowledge base is portable (i.e. independent of the coding engine), enabling sharing this with any interested parties, e.g. administrative data suppliers, to increase the consistency of coding. The use of bridging codes permits simultaneous coding to multiple classification systems, essential if periods of dual-coding are required. The knowledge base approach can help to inform the development of future versions of a classification, by providing a reference frame of business activity descriptions.

More Related