Migration of a large survey onto a micro-economic platform

Migration of a large survey onto a micro-economic platform Val Cox April 2014

Micro-economic Platform (MEP) • Standardises and automates processes - Provides more efficient processing, more analysis • Enables Statistics NZ to gain more from available data - Basic principle: use administrative data wherever possible, with surveys filling the gaps - Objective: bring core information about every business in the economy into the Longitudinal Business DB to allow Statistics NZ to respond quickly to changing needs for economic statistics

Aim of paper • To discuss the challenges of building a non-response imputation package for a large survey on the MEP - Rationalises the use of • Banff for outlier detection and imputation • SEVANI (System for Estimation of Variance due to Nonresponse and Imputation) to estimate sampling and non-sampling errors

Annual Enterprise Survey(AES) • Provides statistics on the financial performance and position of New Zealand businesses - Captures about 90% of New Zealand's GDP • Uses four different major data sources • Three administrative (covers 72% of the population) • One postal survey

AES before MEP

Editing strategy of AES on MEP • Guided by the Methodological Standard for E&I • Key objective of standard - Editing is fit-for-purpose and enables continuous improvement of processes and data quality • Key principles used • Automate editing processes where possible • Use Statistics NZ standard editing tools, wherever possible, to achieve standardisation

Editing system of AES in MEP • Uses Banff to automate and standardise editing and imputation processes • Uses analytical views to assess the quality of the edited data

Challenges and solutions A. Sheer volume of data - 28 questionnaires, 113 industries and 180 variables • Solution: Use of a “thin slice” approach • Restrict dataset to one questionnaire and one industry to show all stages of E&I are working • Once successful, expand dataset to include more industries until all 28 questionnaires are replicated • Successful in determining optimal level of automation for correcting failed edits

Challenges and solutions • Determining which variable is erroneous when groups of variables must add or subtract to a total - Banff “errorloc” procedure always recommends to change one variable by a large amount - Change is done by “deterministic” procedure • Solution: Assign weights to variables • Assign lower weights to more reliable variables so Banff doesn’t change their values Examples: totals, gross profit, since respondents use this to determine the tax they pay

Challenges and solutions C. Outlier detection - Old system detects outlier in 3 key variables but unlinks whole unit (all variables) - Banff does univariate outlier detection • Solution: Compared 2 E&I runs of data • 1st run had only the 3 key variables set as outliers and 2nd had all variables included in outlier steps • Decision: Choose variables to be set as outliers based on the effect on the totals

Challenges and solutions • Running imputation one variable at a time would have been very time-consuming • Solution: Group variables • By imputation method (4 methods) • By industry (some industries have different characteristics) • By type of variable (e.g. some variables can be negative)

Challenges and solutions E. Imputation failed for some variables - Some imputation cells were too small • Solution: Merged small imputation cells • Each imputation stage was run twice, the first without cell merging and the second with cell merging, resulting in 8 imputation stages • Use of a “catch-all” stage at the end (9th stage) to carry out mean imputation by industry

Challenges and solutions F. Challenges with no solutions - Analysis of improvements in the E&I was slow as it took several hours to run E&I and write back to the main data storage area to view data in a cube • Attempt to replicate published results as closely as possible created a dilemma: When to stop trying? • What was the “right” answer?

SEVANI • Provided a standardised and automated method to report on estimates of variances due to sampling as well as non-response and imputation • Challenges: - Can produce output for one variable at a time - SEVANI required a lot of parameters to set-up - MEP is unit-based so can’t easily output SEVANI results • Solution: - Use of a macro to identify variable names - Created a SAS code to set-up parameters - Output SEVANI results outside MEP

Next steps • Educate the users of the new system on MEP • Identify potential areas to make improvements in the editing and imputation system • Create a new MEP collection for Charities data to include its own editing and imputation system

Migration of a large survey onto a micro-economic platform

Migration of a large survey onto a micro-economic platform

Presentation Transcript

Explaining Economic Development A Survey

Economic Survey Of Japan

ECONOMIC MICRO-ECONOMIC USES OF ORGANISMS

ECONOMIC MICRO-ECONOMIC USES OF ORGANISMS

MUNIS Platform Migration Project

A migration from conventional LMS to a cloud-based learning platform

A Survey of LTE

THE MACRO (ECONOMIC) DETERMINANTS OF INTERNATIONAL MIGRATION: A SURVEY by Michael J. Greenwood

Transgression = landward migration of a shoreline Regression = seaward migration of a shoreline

A Survey of Life

Outcome 1.2 Economic Contribution of a Large Organisation

Survey of Micro Quasars

Results of a Survey

A Perfusion-based Micro 3-D Cell Culture Platform

A migration from a conventional LMS to a cloud-based learning platform

A survey of competencies

Migration and Economic Mobility in Tanzania: Evidence from a Tracking Survey

A Survey of Cement

Anatomy of a MetaFrame 1.8 to MetaFrame Extended Platform Migration

Automatism platform Micro (TSX37xx)

Sports Betting Platform Migration