250 likes | 268 Views
This article explores the use of tax data programs at Statistics Canada for the Annual Survey of Manufactures. It discusses the availability and access to tax data, the survey methodology, and the use of administrative data for estimation and imputation. The article also presents two analytical studies that evaluate the substitution of survey data with tax data and the impact of using tax data on estimates.
E N D
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical Commission and Economic Commission for Europe Conference of European Statisticians
Outline • Introduction • Tax data programs at Statistics Canada • The Annual Survey of Manufactures (ASM) • Overview • Strategy for use of tax data • Analytical studies • Conclusions and Future Work
Introduction • Desire to increase use of tax data • Reduce respondent burden • Reduce survey costs • Can be used at many stages of survey process • Stratification • Survey data validation • Edit and imputation • Estimation
Tax Data programs at Statistics Canada • Tax data available to Statistics Canada • Collected by Canada Revenue Agency (CRA) • Access via a data-sharing agreement • To be used only for statistical purposes • Two extensive tax data programs • Unincorporated businesses (T1) • Incorporated businesses (T2)
Tax Data programs at Statistics Canada (cont’d) • T1 - Population • Unincorporated businesses • Account for small share of revenues • Administrative Data • Sample-based • Limited set of variables • Edit and imputation is applied • Weighted benchmarked estimates
Tax Data programs at Statistics Canada (cont’d) • T2 - Population • Incorporated businesses • Account for large share of revenues • Administrative Data • Census-based • Extensive set of variables • Edit and imputation is applied • Micro-data is produced
The Annual Survey of Manufactures • Manufacturing is an important sector of Canadian economy ~17% of GDP • Annual Survey of Manufactures • Take-none Portion and Survey Portion • Extensive questionnaire (financial and commodity) • Data requirements (pseudo-census)
The Annual Survey of Manufactures (cont’d) • Target population • Drawn from Statistics Canada’s Business Register (BR) • All businesses classified to manufacturing • Sample design • Non-survey portion • Administrative data • Survey portion • Stratified SRS (Stratum = NAICS * Province * Size) • Small take-some / Large take-some / Take-all • Collected via mail-out / mail-back, follow-up via telephone
The Annual Survey of Manufactures (cont’d) • Edit and Imputation • Edits applied to ensure accuracy and coherence • Extensive imputation to produce ‘pseudo-census’ dataset • Historical imputation • Ratio imputation • Nearest-neighbour donor imputation
The Annual Survey of Manufactures (cont’d) • Estimation • Non-survey portion (tax data) • Total Expenses only • T1: weighted domain estimates • T2: aggregates from administrative census dataset • Survey portion (survey data and imputed data) • Aggregates from pseudo-census dataset • Domains of interest: NAICS and Province
Analytical Studies • Motivation for two studies: Which variables should be ‘replaced’? What are the effects of the strategy on final estimates for all variables? Study 1 – Data comparison Study 2 – Impact Analysis
Analytical Study 1 Study to select appropriate variables • Comparison of reported data collected via survey and tax • Simple businesses only • Assess suitability for substitution of survey data Based on ~6,000 businesses
Analytical Study 1 (cont’d) • Correlation Analysis • Wide range of correlations • Total Expenses: 0.9 • Total Energy Expenses: -0.10 • Reporting Patterns • Same pattern (zero or positive) for individual businesses • Total Expenses: 99% • Total Energy Expenses: 50%
Analytical Study 1 (cont’d) • Distribution of Ratios • Examined histograms, fraction between 0.9 and 1.1 • Total Expenses: 60% • Total Energy Expenses: 16% • Population Estimates • Relative difference between tax and survey-based estimates • Total Expenses: 3% • Total Energy Expenses: 28%
Analytical Study 1 (cont’d) • Selected several variables for direct substitution • Section totals and sub-totals • expenses, revenues, inventories, etc. • Remaining variables are imputed • Imputation => assign distribution of details within each total
Analytical Study 1 - Conclusions • Distinctively different results for different variables • Direct substitution seems feasible for totals • Direct substitution not recommended for details • Use standard methods to impute other variables
Analytical Study 2 Analysis to evaluate impact of tax data strategy Bias • Comparison of estimates from different scenarios Variance • Shao-Steel approach for variance estimation • Reflects variance from sampling and imputation • Assume equal probability of response within imputation class
Analytical Study 2 (cont’d) Scenarios
Analytical Study 2 (cont’d) Comparison of resulting estimates for Total Expenses Relative Difference from “HT – No Tax” – Total Expenses * Median value for all such domains
Analytical Study 2 (cont’d) Comparison of estimated CV’s for Total Expenses Co-efficient of Variation – Total Expenses * Median value for all such domains
Analytical Study 2 (cont’d) Comparison of resulting estimates for Total Energy Expenses Relative Difference from “HT – No Tax” – Total Energy Expenses * Median value for all such domains
Analytical Study 2 (cont’d) Comparison of estimated CV’s for Total Energy Expenses Co-efficient of Variation– Total Energy Expenses * Median value for all such domains
Analytical Study 2 - Conclusions • Bias • Small relative difference between estimated totals from scenarios • Variance • Relatively low CV for all options • Tax substitution variables: Scenario 3 most efficient • Non-tax substitution variables: Scenario 1 most efficient • Analytical capabilities • Scenarios 2 and 3 provide most detail
Conclusions • Results used to select 2004 strategy – “PC – Tax” • Meets needs of data users • Reduced cost and response burden • Maintain (improve) quality • Striving to further increase use of tax data • Increased portion of population • Increased number of variables
Future Work • Editing of tax data • Similar approach to survey data approach • Potential to expand list of direct substitution variables • Indirect use of tax data • More adaptive models • Quality indicators • Account for increased variance and potential for bias due to imputation