520 likes | 734 Views
Editing And Imputation For Manufacturing Statistics At Statistics Canada. Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March 15 to 17, 2011. Outline Of The Presentation. Overview of the Manufacturing Program Centralized Process Surveys
E N D
Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March 15 to 17, 2011
Outline Of The Presentation • Overview of the Manufacturing Program • Centralized Process • Surveys • Overview of the UES Survey Process • Post Collection Processing Inputs & Tools • Use of Tax Data • The many phases of UES Post Collection Process • Managing the UES Post Collection Process
Business and Trade Statistics Industry Statistics Economy-wide Statistics Agriculture, Technology and Transportation Statistics Manufacturing and Energy Consumer Prices Agriculture Distributive Trades International Trade Small Business And Special Surveys Service Industries Producer Prices Science, Innovation And Electronic Information Enterprise Statistics Investment and Capital Stock Transportation Enterprise Statistics Statistics Canada
Establishments primarily engaged in the physical or chemical transformation of materials and substances into new products Includes assembly of the component parts of manufactured goods, blending of materials, finishing of manufactured products by dyeing, heat treating, plating and similar operations Transformation of own materials or those owned by others Service outputs: custom work, repair and maintenance Product outputs: finished goods, intermediate goods Who Are Manufacturers?
Monthly Survey of Manufacturing (MSM) Annual Survey of Manufactures and Logging (ASML) Series of sub-annual commodity surveys Manufacturing Program At Statistics Canada (STC)
Monthly indicator of manufacturing activity Last Redesign in 1999 Designed to be a reliable indicator for both trends and levels Establishment Survey (n= 10,500) Stratified by Province, NAICS and Size General Characteristics Of The MSM
Sales Goods of own manufacture Inventories Raw materials Goods-in-process Finished products Orders New orders Unfilled orders Goods purchased for resale (revenue and inventory) These data are collected but not released Sales is the main concept, exceptionally production for some industries (aerospace and shipbuilding) MSM Concepts
MSM Sampling Plan Take-All Take-Some Survey Tax replaced Take-None
Background The Goods and Services Tax (GST) is the federal Value Added Tax GST is collected by the Canada Revenue Agency (CRA) The CRA provides tax data to Statistics Canada Information received includes the Business Number, revenue, tax remitted and input tax credit MSM Sampling Plan: Use Of Tax
Who is replaced? Single establishment enterprises Replace 50% of sampled data with GST data Chronic refusals Who are not replaced? Very large single enterprise establishments Complex units (i.e. multiple establishments) – as it is found in the GST database Use Of Tax Data
Measures the contribution of manufacturing industries to economic activity in Canada In 2010, manufacturing accounted for 15% of GDP and 12% of total employment (SEPH) Key input to SNA Input-Output tables Survey collects data on what commodities are produced (Make matrix) where commodities are destined (provincial I/O tables) what commodities and primary inputs are used in production (Use matrix) What Is The Annual Survey Of Manufactures And Logging (ASML)?
ASML is conducted under the umbrella of Statistics Canada’s Unified Enterprise Survey Program (UES) Same as MSM Establishments primarily engaged in manufacturing and logging activities and classified to NAICS 31, 32 and 33 as well as NAICS 113 Estimates produced for 261 NAICS6 level industries Estimates produced for the 10 provinces and 3 territories. Survey Coverage
Revenue variables (16), expense variables (43), detailed opening and closing inventories (12), other financial (5) Sales or outputs variables are valued at producer or FOB factory gate prices required by SNA Commodities consumed (inputs) and produced (outputs) both goods and services Collect commodity values and quantities (for selected goods) Services produced and consumed collected as expense items and classified based on COA Content: Commodity Variables
Types Of Administrative (Tax) Data • From the Canadian Revenue Agency (CRA) • Agreement between two agencies • T1 (unincorporated businesses) • T2 (incorporated businesses) • T4 (pay slips) • GST (goods and service tax) • PD7 (payroll deduction accounts)
Why A Centralized Process? Best Practices Standardization of Processes Cross Survey Comparisons Enterprise Centric Processing/Coherence Analysis Efficient use of Resources Transportable Knowledge Across Survey Programs
Challenges Of A Centralized Process Remain Centralized Distribute processing Priority Setting Communication and Coordination
UES Post-Collection Processing “Clean” Records Tax Data Central Data Store Pre-Grooming USTART Edit & Imputation Subject Matter Review & Correction Tool Allocation / Estimation
Collection Period: February to early October Collection Processing System: Blaise Blaise can be seen as being a Collection Control Center Blaise has many functions: Call Scheduler Transaction history files Audit Trail Files And more Collection
Questionnaire number Mail-out date Number of calls Length of the call Number of contact attempts Response code And more Blaise: Variables
BlaiseTransaction History (BTH) Files Collection data analysis: Produced a paper on best time to call Produced a paper on maximum # of attempts Audit Trail Files Find outliers Difficult to answer questions Blaise: Bonuses Over The Years
Precontact(Dec-Jan) Mostly for Business Register (BR) births; verification of contact information (name, address, …) By phone (in a few cases, a letter or a fact sheet is sent) Mail-out of questionnaires (Jan-March) 2 or 3 mail-out dates Follow-up in case of non-response for some units (begins about a month atfer mail-out) Phone call, remail or fax Mail-back of questionnaires Verifications of received questionnaires / Edits Is the questionnaire complete or are some key variables missing? (Edit follow-up by phone in some cases) Collection
Coding of questionnaires (about 20 response codes) Response, non-response, out-of-scope, … Imaging / Data capture (CADI - Computer Assisted Data Input) Collection
Centralized Collection Pre-Contact (17K Businesses) Score Function Mailout (38K CEs) Edit / Verification (BLAISE) Receipt (75% target) “Clean” Records Capture / Imaging Delinquent Follow-Up
Introduced in 2002, the UES score function is the main tool used at the collection stage to determine which priority to give for the follow-up of about 23,000 Collection Entities (CE) each year. Reduces collection costs yet retains data quality Similar to the collection goal of obtaining a high weighted coverage response rate. PRIORITY 1:Extensive follow-up for the larger revenue CEs in cases of non-response. PRIORITY 0:Minimum follow-up for the smaller CEs in cases of non-response. UES: Data Collection / Score Function
Operating Surplus Value added Shipments Outputs Inputs GDP EBIT Sales Gross profit Operating revenue Cost of sales Expenses Chart Of Accounts COLLECTION LINK, BRIDGE, CONCORDANCE DISSEMINATION
Standardization in business data collection Higher survey response Increase in quality of data Comparison of data from various sources Increase efficiency in using administrative data Expected Benefits Of A Chart Of Accounts
Links To Chart Of Accounts CHART OF ACCOUNT Establishment Enterprise Legal entity
UES: Use Of Tax Data • Validation (comparison) • Verify dubious collected data against the equivalent tax data record • Imputation • One of the methods used for non-response • Estimation • Below take-none • Direct Data Replacement • Update Business Register • Allocation of survey data (use tax revenues, salaries and expenses)
Centralized Processing Systems And Databases • Develop centralized systems • Move away from stand-alone • Single point of access for security • Integrated Questionnaire Metadata System • Edit and imputation • Allocation and Estimation • Data Warehouse
Enterprise Portfolio Managers • Top 350 enterprises in Canada • Status • Platinum, Gold, Silver, Bronze • Personal visits • Enterprise Profiling • Coordination of mail-out and collection • Enterprise/ Establishment coherence • Holistic Response Management • Strategic Response Unit • Escalation Process / Statistics Act
Review and Correction (Post-Capture) Done via an application which is a micro-editing tool Opportunity to perform edits and to manually correct data before the automated edit and imputation process Opportunity to gain an understanding of the quality of data coming in from the field
What Is Generally Done By SMOs During This Process? Ensure that industry codes are valid and response code are correct Ensure that equivalent survey cells have consistent data Enter data for records that came in after the collection cut-off date Review high impact outliers in terms of profit, average salary, etc. Check comments made by respondents and collection staff
Why Is This Process Necessary? Reviewing and correcting records will increase the number and quality of donors for the automated edit and imputation (E&I) stage. This will improve the quality of data coming out of E&I. Need to assess the quality of collected data Determine if problems with questionnaire Inability of respondent to provide a given data point Determine if enough data for E&I
What Should Not Be Done During This Process? Do not plug data for non-response records. They will be imputed during the automated E&I.
What Is E & I? Editing Verify that parts add-up to total Ensure that there are no missing values where parts add up to total There must be consistency between related variables Imputation Changing values in fields which fail edit rules with a view to ensuring that the resulting data satisfy all edit rules. In practice, reported data will rarely be changed Impute for missing data or partially responded data Impute entire records in the case of total non-response
Why Is E&I Necessary? To produce a complete and consistent data file that accounts for all sampled units Both units that did not respond to the survey must be imputed and units that did not provide a complete response must be imputed Correct erroneous responses
E&I Terminology Data Group Groupings (defined by SM) of records that will be kept together for imputation purposes These groupings are based on multi dimensions: industry (NAICS) geography (province) Data groups that will be used for a specific survey will depend on: initial sample design (number of units sampled and the level of stratification used) number of records that respond to the survey (a minimum of 5 or 10 records are required in a data group) May be changed during production if not enough donors
E&I Terminology (continued) Edit Group Grouping of variables within a record that will be processed together in an imputation method Generally edit groups may be defined as follows for most surveys: revenue and expense sections employment section and provincial distribution of goods/services sold Allows for a record to be a donor if it has clean data in one section even when other sections are blank; this increases the donor pool
E&I Terminology (continued) Key variables Total operating revenue Total operating expenses Salaries Cost of goods sold
The Stages Of The E&I System Pre-processing BANFF E & I System Post-Processing Allocation
Preprocessing Deterministic Edits Conditional edits - If A then B Sum of Parts (SOP) Assign 100% to percentage totals Impute reporting period Donor Outlier Detection
BANFF E & I System Impute for missing key variables as specified by subject matter (i.e. total revenue, total expenses) Impute for other missing variables: Apply Historical Trend Apply Current Year Trend Use donor (for partial imputation), Select a donor for massive imputation for total non-response
BANFF Algorithms DIFTREND- Historical trend imputation CURRATIO- Current ratio imputation PREVALUE– Value from the previous period for the same unit is imputed PREAUX– Historical value of a proxy variable for the same unit CURAUX– Current value of a proxy variable for the same unit
Post-Processing Prorate components to ensure that they sum exactly to totals Perform a number of consistency checks to ensure that micro-data are valid Assign customer location (percentage cells) Massive Imputation (donor selected during processor but applied in the post-processor)
Allocation - Definition & Purpose Definition: Allocation is the distribution of survey and administrative data from their acquisition level (Collection Entity) to the targeted statistical units (Establishments or Locations) as defined on the survey frame. Purpose: To provide fully-processed micro data on a fiscal year basis, for establishments or locations in-sample for the UES Determine the distribution of value added by province