290 likes | 418 Views
The Challenge of Integrating New Surveys into an Existing Business Survey Infrastructure. Éric Pelletier Statistics Canada ICES-III Montréal, Québec, Canada June 18-21, 2007. Outline. Introduction to the Unified Enterprise Survey (UES) Culture surveys environment Integration steps to UES
E N D
The Challenge of Integrating New Surveys into an Existing Business Survey Infrastructure Éric Pelletier Statistics Canada ICES-III Montréal, Québec, Canada June 18-21, 2007
Outline • Introduction to the Unified Enterprise Survey (UES) • Culture surveys environment • Integration steps to UES • From Culture to UES: Frame, sampling, etc. • Special case: Film Production survey • UES Estimation process • Back-casting for the previous two years • Conclusion and future work
Unified Enterprise Survey (UES) • UES comprises many business surveys which use unified concepts and processes • 1997: 7 surveys • … • 2005: 45 surveys • 2006: 54 surveys • 2007: 62 surveys • The goal of UES: produce reliable estimates at the provincial and industrial levels
Objectives of the UES • Promote an increasing use of tax data • Reduce the cost of the surveys • Reduce the response burden • Produce estimates for the financial variables (revenue, expenses, salaries and wages, etc.)and non-financial variables for all UES industrial sectors
UES Sampling Process • Sampling frame: Business Register of Statistics Canada (list of establishments) • Sampling unit: Within a given enterprise, a cluster of establishments within the same province and industrial group • For example: establishments A and B in the same province and industry sampling unit • Simple units (activity in one province and one industry) and complex units
UES Sampling Process • Stratification: • Province, Industry, Revenue • Strata • 1 take-all stratum • 2 take-some strata • 1 take-none stratum below thresholds, tax data • Exclusion thresholds • Delimit the take-none units from the take-some units (no questionnaire is sent to the take-none)
UES Sample Design T1 (unincorporated) T2 (corporations) Take-alls Stratum=2 Survey Take-some Stratum=1 Tax Take-none
UES schedule • For example, for reference year 2006 (RY2006): • Sampling: October 2006 • Collection: February to October 2007 • Edit & Imputation: July 2007 to December 2007 • Estimation: November 2007 to March 2008 • The estimates are produced within 15 months(January 2007 to March 2008) • The estimation is done one year after the selection of the sample
Culture surveys environment • ‘Activity’ based frames (e.g. list of books) • Census surveys • Occasional surveys (annual surveys, not necessarily every year) • Maintained by Culture Division The Culture Streamlining Initiative was put in place to reduce the duplication in annual survey processes while promoting the use of the business survey infrastructure
Culture environment versus UES environment • In the UES, the frame is based on industrial structure (economic survey) rather than activity (e.g. list of books, list of films, etc.) • For the analysts, it’s a change in the way they are analysing the data • More flexibility in the UES environment • All the steps of a survey were compared to facilitate the integration
Advantages of the integration to UES • Common methodologies for all annual enterprise surveys • Possible to adapt some of the parameters for the needs of the surveys (at the sampling, imputation or estimation process) • Infrastructure was established in 1997 with the Enterprise Statistics Division • Relatively easy to integrate new surveys
Integration of surveys into UES • Two sets of surveys: • “Wave 1” surveys in RY2006 (Book Publishers, Heritage Institutions and Performing Arts) • “Wave 2” surveys in RY2007 (Film Distribution, Film Production, Film Post-Production, Movie Theatres and Sound Recording) • Integration in two steps: • Step 1: From culture environment to industry-based survey, the years before UES (called “UES_lite”) • Step 2: Integration to UES
“UES_lite” environment • Concepts are similar to the UES surveys • The processing is done outside the UES infrastructure • The surveys are processed by the subject matter division and the methodology division • As opposed to UES processing, which is primarily handled by another Statistics Canada division called the Enterprise Statistics Division
From Culture to UES • Sampling, Frame: • Culture: Census - ‘Activity’ based • “UES_lite”: Sample - Establishments • UES: Sample - Establishments within the same enterprise, same province, same industry code • The analysts were able to create reconciliation files between the frames • Some other minor differences
Special case:Film Production survey • Collection: • Special case with the Film Production survey for RY2005 • The Business Register (BR) is not up-to-date enough for this survey • Links were discovered between the sampled establishments and establishments outside the sampling frame
Special case:Film Production survey • Pre-contact was done for all the units • Approximately 400 units were added to the sample (these units were not on the Business Register) • Indirect sampling was used to address this problem • A different estimation program was created for this survey
UES and “UES_lite”Estimation Process • Total estimate = Survey portion + Tax portion • Survey portion: • Horvitz-Thompson estimator • Outlier detection and treatment • Final weight calculation • Tax portion (take-none portion): • Below the exclusion thresholds: Tax data • Domain estimations: Industry, Province, etc. • Variance and coefficient of variation (CV)
Special case:Film Production survey • Estimation: • The Film Production survey RY2005 was a special case • Due to the application of indirect sampling, the inverse probability method was implemented (see Choudhry (2006)) • Without going into all the details, • The inverse probability method determines the probability that at least one sampling unit on the frame which leads to the reporting unit would be sampled • The base weight is computed as the inverse of the selection probability
Special case:Film Production survey • The complex weighting procedure led to the use of replicates in estimating the variance of the estimates • More precisely, the jackknife replication method is used to calculate the variance • The estimates will be produced within the next few weeks: the release date for RY2005 is July 2007 (same release date as the other Wave 2 surveys), a little bit behind schedule…
Special case:Film Production survey • The Film Production survey for RY2007 (integration year in UES) could not be put into the UES process because: • Cost of the post-selection additions • Timeliness • Different processes, like the jackknife replication method for the variance calculations • Instead of the inverse probability method, the weight share method will be used • With this method, we assign an average weight based on the sampled units and the number of links
Special case:Film Production survey • The weight share method cannot be integrated directly into the UES process • A way to integrate the weight share method into the UES process was derived (see Beaumont (2007)) • With this, it will be adaptable to the regular UES estimation program • The difference from the inverse probability method is that with the weight share method, we expect a slight increase in the variance • This “special” integration will be done at the end of 2007 / beginning of 2008
Estimation – Back-casting • For RY2005 (first year in “UES_lite) for the “Wave 2” surveys, the previous estimates were produced in the Culture environment • As was previously shown, the frame is different for RY2005 (Business Register) • Potential break in the series • Back-casting procedure is used to reproduce historical estimates using the Business Register
Estimation – Back-casting • Back-casting is done for the two previous reference years (for example, RY2003 and RY2002) • A match between the units from the RY2005 sample and the units from the previous culture files is done using the reconciliation files • If the unit is not matched to the previous year’s culture files, the data is imputed
Estimation – Back-casting • Adjustments to the weights will be done based on the population counts from the Business Register for the two back-casting years (for example, RY2003 and RY2002) • Estimates are produced by domains, and the CV are calculated for the two back-casting years for the “Wave 2” surveys (released date is July 2007)
Infrastructure - Processing • One of the main challenges in the integration of those surveys is the communication between the three parties: • Methodology division (responsible for the survey methods) • Subject matter division (responsible for the content, the analysis and the publication) • Enterprise Statistics Division (responsible for the business survey infrastructure) • Started in October 2006, the process will be completed in March 2009
Conclusion and future work • Presently, three “Wave 1” surveys are being integrated into UES for RY2006 (sample was selected in October 2006, estimation is being prepared) • Next year, for RY2007, the “Wave 2” surveys will be integrated • Because of the infrastructure, some modifications will be made to the UES estimation program for the Film production survey, in order to integrate this survey into UES
Thanks • Special thanks to everyone who worked on those surveys, and who helped me in the preparation of this presentation
Éric Pelletier (613) 951-5213 eric.pelletier@statcan.ca