240 likes | 639 Views
Survey of Electronic Commerce and Technology: Past, Present and Future Challenges. Jason Raymond. Third International Conference on Establishment Surveys June 2007. Outline. Description of the survey Methodology Improvements to the sample design Weighted Outliers Future challenges.
E N D
Survey of Electronic Commerce and Technology: Past, Present and Future Challenges Jason Raymond Third International Conference on Establishment Surveys June 2007
Outline • Description of the survey • Methodology • Improvements to the sample design • Weighted Outliers • Future challenges
Description of the survey • Annual survey in place since 1999 • Cross-economy survey • Some exceptions at sub-industry level • Domains of interest: • NAICS, SIZE (number of employees)
Description of the survey • Two-page questionnaire with questions on: • Use of information and communications technologies (Internet, intranet, web site, …) • Use of electronic commerce for the purchase and sale of goods and services • Barriers to electronic commerce • Types of questions: • Mostly categorical • Some numerical • total sales over Internet • percentages
Methodology • Sampling • Universe • Statistics Canada’s Business Register • List of public units • Target population • Fixed thresholds of exclusion: • $100,000 or $250,000 in gross business income depending on industry • Covers approximately 95% of income in each industry • around 700,000 businesses
Methodology • Sampling • Stratification • NAICS3, NAICS4 • Size: • 0 to 19 employees • 20 to 99 employees • 100 to 499 employees • 500 employees and more -> Take-all stratum • Public/private sector Take-some strata
Methodology • Sampling • Neyman allocation • Sample Selection • Sample size: around 19,000 enterprises • Maximum overlap between two consecutive years: • Kish and Scott method (1971) • Approximately 70% overlap
Methodology • Outlier detection • Variables: • Sales over Internet • Year over year difference for sales over Internet • Method: • Variant of sigma gap • Distance measure between observations
Methodology • Partial nonresponse (8.3%) imputation • Deductive (1%) • Historical (0.1%) • Administrative (0.02%) • Donor (7.2%) • Total nonresponse (31%) reweighting
Methodology • Estimation using Statistics Canada’s Generalized Estimation System (GES) • Types of estimates • Means • Totals • Proportions • Ratios • Data quality measures based on CVs and imputation rates
Improvements to the sample design • When? • Current sample design tested in 2004 in parallel with original design and adopted in 2005 • Why? • Improve the comparability of estimates over time • Need for estimates by size of enterprise
Improvements to the sample design • Target population • Original sampling design: • Units accounting for 95% of the total income • Drawback: Unstable population over time • New sampling design • Fixed thresholds of exclusion: $100,000 or $250,000 depending on the industry
Improvements to the sample design • Stratification and allocation • Original sampling design • NAICS3, NAICS4 • Lavallée-Hidiroglou: 2 take-some strata and 1 take-all stratum • Auxiliary variable: GROSS BUSINESS INCOME • Drawback: Not efficient for estimates by size (Number of employees)
Improvements to the sample design • New sampling design • Stratification: • NAICS3, NAICS4 • Size: • 0 to 19 employees • 20 to 99 employees • 100 to 499 employees • 500 employees and more -> Take-all stratum • Public/private • Neyman allocation Take-some strata
Weighted Outliers • Small proportions of firms sell over Internet (8% of private sector and 16% public sector) • Moderate values but large weights sometimes significantly influence estimates • Previously outlier detection uniquely for unweighted values of sales over the Internet
Weighted Outliers • Weighted outlier detection and treatment implemented in 2006 • Same detection method as for unweighted values (variant of sigma gap method) • Treatment methods studied • Hidiroglou/Srinath • Winsorization • Dalén and Tambay • Promotion to own stratum
Weighted Outliers • Hidiroglou/Srinath (1981) • Weight reduction method • Minimizes MSE of estimator for total • Requires use of population characteristics which are unknown, and which may possibly not be estimated reliably.
Weighted Outliers • Winsorization • Reduces values larger than a certain cutoff to the cutoff itself (dependent on outlier detection method) • Modified to weight reduction method
Weighted Outliers • Dalén(1987) and Tambay(1988) • Cross between Winsorization and weight reduction • The cutoff for weighted outlier detection is determined for each stratum • Outlier value is split into two parts: • Portion less than the cutoff which receives the same new weight as the non-outliers; • Portion greater than the cutoff which is allocated a weight of 1
Weighted Outliers • Promotion to own stratum • Outliers assigned a weight of 1 • Remaining units in stratum have their weights adjusted • Outlier represents only itself during estimation
Weighted Outliers • Implemented method: Dalén and Tambay • Fewer assumptions • Nice compromise • Impact on the estimates is reduced • Not as drastic as promotion to own stratum • Method performed well using 2005 data • Additional empirical studies to confirm effectiveness of the method (simulations?)
Future challenges • Response burden • Maximising overlap = increased response burden? • Minimal effect on response rates • Conditioning effect? • Sample rotation: • Ease response burden • Control sample overlap for longitudinal analysis
Future challenges • Statistics Canada’s Business Register redesign • Sampling elements based on operating structure VS statistical structure • Certain modeled variables replaced by administrative data
Jason Raymond 613-951-1917 Jason.Raymond@statcan.ca