290 likes | 308 Views
Editing Administrative Data and Combined Data Sources. Introduction. Sub-topic: Use of administrative data for business surveys and economic data
E N D
Editing Administrative Data and Combined Data Sources Introduction
Sub-topic: Use of administrative data for business surveys and economic data Papers focus on methods for pre-processing and edit and imputation to obtain high quality administrative data for supporting survey data and incorporating into statistical data. Administrative data is used as a direct statistical source in business surveys and economic censuses by replacing survey data of smaller units thus reducing costs and response burden. Administrative data supports processing of survey data through error localization, imputation models, selective editing techniques and setting thresholds.
Use of administrative data for business surveys and economic data • Relevant papers for sub-topic: • WP2 - Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures, Canada • WP4 - Use and Editing of Administrative Data in the BusinessIndicators Unit, New Zealand • WP5 - Detecting Outliers in Price Quotes for the Canadian Consumer Price Index, Canada • WP6 - Imputation of External Trade Data in Denmark, Denmark • WP9 - The Use of Administrative Data in the Annual Survey of Retail, Wholesale and Services, United States
Sub-topic: Combining Data Sources Combining multiple administrative data sources may replace the need to carry out some surveys. Target variables can be obtained by direct replacement or modeled using administrative data as covariates. Administrative data supports other statistical processes related to edit and imputation and enhances the dimensions of quality with respect to accuracy, coherence, consistency and completeness. Relevant papers for sub-topic: WP7 - Evaluation of Editing and Imputation Supported by Administrative Records, Israel WP8 - Editing and Imputation for the Creation of a Linked Micro File from Base Registers and Other Administrative Data, Norway
Sub-topic: Other Processes Supporting Edit and Imputation Statistical data are becoming more dependent on combining administrative data with survey data. There is a need to expand processes beyond conventional and traditional methods of data collection. High quality, unambiguous metadata about the administrative data must be fully integrated with the survey data and communicated through every step of the processing operation and especially to end users. Relevant paper for sub-topic: WP3 - Conceptual Modeling of Administrative Register Information and XML - Taxation Metadata as an Example, Finland
Editing Administrative Data and Combined Data Sources Enjoy the Presentations!
Editing Administrative Data and Combined Data Sources Summary of Papers
Use of administrative data for business surveys and economic data • All the papers focus on the use of administrative data for enhancing and improving economic statistical data: • Reduction of costs and response burden; • Improving edit and imputation processes by using administrative data for error localization and imputation models; • Setting thresholds and benchmarks for selective editing techniques. • Examples were shown on the use of tax data and trade data with an emphasis on the need for direct pre-processing and edit and imputation procedures to define timely and accurate target variables that are needed for survey processing.
Use of administrative data for business surveys and economic data • Other main points: • Quality assessment presented in the papers: • Indicators for evaluating definitions, consistency, correlation and distributions between survey variables and administrative data, • Assessment of edit and imputation procedures on final point estimates and their variance. • The importance of understanding the needs of users to produce “fit for use” data through selective editing techniques compared to “perfect” data through full editing. • Outlier detection as a form of selective editing technique which take into account skewed distributions of the economic data.
Use of administrative data for business surveys and economic data • Other main points: • Imputation models in the papers included the use of historical data, ratio imputation and nearest neighbor donor imputation, as well as imputation on both a micro and macro level. One example was the imputation for statistical units reporting bi-monthly or half-yearly on tax data to obtain timely monthly data. • All papers emphasize the importance of quality and error checks on the final outputs based on the combined administrative and survey data sources.
Use of administrative data for business surveys and economic data • Specific topics from the papers: • At Statistics Canada, an Economic Census was developed based on high-quality administrative data, the Business Register and survey data. • At Statistics New Zealand and the Census Bureau a comprehensive program is being carried out to incorporate more administrative data into the survey processes of economic data by replacing survey data of smaller units and moving towards selective editing techniques.
Use of administrative data for business surveys and economic data • Specific topics from the papers: • Both Statistics New Zealand and Statistics Denmark discuss edit and imputation processes specifically for trade statistics where administrative data is fundamental to the imputation of missing and erroneous data. • Statistics Canada present different methods for outlier detection as a special case of selective editing techniques. • Both Statistics Canada and the Census Bureau assess the quality of outputs based on administrative data by the impact on the efficiency of the final point estimates.
Combining Data Sources • The emphasis of the papers is on linking multiple high quality administrative data sources to model and impute target variables for social surveys. • The more sources linked together the higher the risk of errors through conflicting values of variables. Each data source must be assessed for its completeness and accuracy to avoid introducing new errors into the statistical data. • Administrative data improves the quality of statistical data through error localization, imputation models, outlier detection, and selective editing techniques. It also reduces the need for edit and imputation. • Boundaries between edit and imputation are constantly moving due to the use of multiple sources of data. Administrative data support the detection and correction of errors. They also provide a source of data as a reference file for imputation.
Combining Data Sources • Other main points: • Administrative data supports both the error detection and error correction processes: • by supplementing survey data and allowing for better model specification for imputation either by adding covariates or by actually replacing missing or erroneous data; • for use as a reference file to confirm erroneous values of variables and reasons for failed edit checks; • for quality assurance to identify errors resulting from both the data collection phase or the data processing phase. • Prior knowledge and understanding of the data in a multi-source data collection is essential for the selection and integration of the data sources.
Combining Data Sources • Specific topics from the papers: • At Statistics Norway multiple administrative data sources are linked to obtain employment characteristics. The electronic data capture has a large impact on the development of integrated and coherent statistical systems. • Papers demonstrate methods for identifying units, timeliness of the variables, definitions and classifications in order to merge multiple administrative sources and develop imputation models for target variables not present in the data sources. • CBS Israel has wide experience working with multiple sources of administrative data and its use for both the editing stage and the imputation stage and also supporting other statistical processing.
Other Processes Supporting Edit and Imputation • Many survey processes are based on traditional methods of collected survey data. With more use of multiple data sources, statistical processing has to encompass all of the statistical data, both survey and administrative data. • The edit and imputation processes and its validation provide important metadata which result in future key explanations to users on movements in the series. • Other statistical processing supported by the edit and imputation processes are record linkage, coding and the imputation of new variables as well as quality assessment of the final outputs.
Other Processes Supporting Edit and Imputation • Other main points: • The need to understand and interpret register data through a uniform reference frame and in a standard format is vital to both producers and users of the statistical data. • Quality dimensions are enhanced by the use of administrative data with respect to coherence, consistency, comparability, completeness and accuracy. • Imputation for new variables is supported by administrative data by providing better models, more covariates and definitions of weighting classes or the direct replacement by administrative data.
Other Processes Supporting Edit and Imputation • Specific topics from the paper: • Owners of administrative registers do not often hold information about the data in electronic format. The challenge for NSI’s is to translate this information about the data into structured metadata. • Statistics Finland uses the Common Structure of Statistical Information (CSOSI) method, and gives an example of the system when applied to personal taxation data and to the administrative information describing it. • When registers are used in the survey process, producers of statistics must ensure that users gain a good understanding of the content to ensure that they make accurate interpretations.
Editing Administrative Data and Combined Data Sources Points for Discussion
Use of administrative data for business surveys and economic data • Points for discussion: • How can differences in definitions, classifications and timeliness of variables in administrative data be reconciled with survey data without introducing new bias into the data? • Can we automatically assume that administrative data has higher quality than survey data? How should thresholds be set below which administrative data should not be used at all? • Can administrative data directly replace survey data? • Quality measures in the papers focused on the efficiency of point estimates. Are there other quality measures that measure the impact of using administrative data in survey processes, in particular at a micro level?
Use of administrative data for business surveys and economic data • Points for discussion: • How can edit rules be managed and updated to take into account dynamic and constantly changing administrative sources? • Selective editing thresholds described in the papers were determined by budget constraints. Can we incorporate historical data, external knowledge and the influence on the final estimates into the setting of thresholds? • Selective editing techniques for administrative data target larger statistical units, however smaller units are typically used for replacing survey data. Is there a way to efficiently edit smaller units through selective editing techniques?
Use of administrative data for business surveys and economic data • Points for discussion: • Outlier detection methodology is proposed as a selective editing technique but it does not necessarily target the most influential units. Can the methodologies be combined and how should thresholds be determined? • Can selective editing techniques be carried out for multi-variate editing? How can we measure the impact of influential multiple variables and to set thresholds in this framework? • How can better imputation models be developed for administrative data as opposed to survey data which make more use of historical data and multiple data sources? For example, can units reporting monthly be used to impute units reporting by-monthly or half-yearly?
Combining Data Sources • Points for discussion: • Can we develop a mechanism to influence the methods of data collection from suppliers of administrative data in terms of content and format to ensure more generic pre-processing and edit and imputation processes? • The integration of multiple data sources can result in introducing new errors. How should the quality of the variables be assessed in a multi-source data collection, in particular when having to choose between values for the same variable? • The quality of administrative data can vary widely. When we consider combining data sources, should they all be of a similar quality?
Other Processes Supporting Edit and Imputation • Points for discussion: • Is there a mechanism by which we can influence the suppliers of administrative registers to collect and maintain metadata in a machine readable format? • How can we best integrate content information about administrative registers into the metadata describing the overall statistical processing operation, in particular with different formats of data?
Editing Administrative Data and Combined Data Sources Conclusions and Future Research
Underlying theme in all of the papers: The use of administrative data for survey processing and in particular for supporting efficient edit and imputation processes based on error localization techniques, imputation modeling and selective editing techniques, increases the quality of the statistical data and reduces response burden and costs. There is a clear need for standardization/harmonization of definitions and concepts to facilitate the use of multiple sources of administrative data within the survey process.
Future Research: • The development of generic modules for editing and imputation of administrative data is particularly challenging since data collection methods and formats vary greatly depending on the source. • More research needs to go into the development of common portals and electronic data collection which will have a direct effect on methods used for editing and imputation. • Better modeling techniques, edit and imputation processes and quality indicators are needed to assess and correct administrative data prior to its use in statistical processing as well as to increase the quality of the final product.
Future Research: • Further development of a time series methodology approach for error localization and imputation of administrative data which usually have rich historical data. • Administrative data is diverse and may include both numerical and categorical data. The edit and imputation modules have to be able to handle both types of data. • Better methods for setting selective editing thresholds for administrative data based on the influence of the variable as well as the development of a multi-variate framework.
Editing Administrative Data and Combined Data Sources Thank you for your attention! Natalie and Heather