1 / 14

EDITING OF MULTIPLE SOURCE DATA IN THE CASE OF SLOVENIAN AGRICULTURAL CENSUS 2010

This presentation provides an overview of the editing process for multiple source data in the Slovenian Agricultural Census 2010, including database organization, statistical data processing, and main challenges faced. The use of different data sources and the impact of outsourcing data collection are discussed.

sbaldwin
Download Presentation

EDITING OF MULTIPLE SOURCE DATA IN THE CASE OF SLOVENIAN AGRICULTURAL CENSUS 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EDITING OF MULTIPLE SOURCE DATA IN THE CASE OF SLOVENIAN AGRICULTURAL CENSUS 2010 Rudi Seljak, Aleš Krajnc Statistical Office of the Republic of Slovenia

  2. Overview of the presentation • General about the Agriculture Census (AC 2010) • Database organization • Statistical data processing • Main problems and challenges • Conclusions

  3. General information about the AC 2010 • Collection of exhaustive information on all the agricultural holdings (AH) which fulfill the certain criteria stated in the EU regulation. • In accordance with the EU regulation it is conducted every 10 year. • In 2010 conducted in most EU Member States (few in 2009). • The aim of obligatory regulation is to get for the first time the comparable data on agricultural indicators based on the same methodology.

  4. Slovenian AC2010 • Carried out by the Statistical Office of the Republic of Slovenia (SURS) in June-July 2010. • Part of the data collected with the field survey (CAPI) and (large) part was obtained from different administrative sources. • There were 94,686 AH visited in the field → 74,646 that satisfy the ECA criteria. • The field work and data entry program was done by the outsourced company, but all the all the instructions and rules were provided by the SURS’s staff. • About 600 interviewers finished the fieldwork in approx. 75 days

  5. Micro-data Database • Field data were separated into the different tables according to the sets of related questions. • Each of the different administrative sources was put in the separate table. • Each table was „accompanied“ with the statuses of variables. Status „flagged“ the collection mode and also each change in the process. • Each table has one associated table where all the changed records are inserted. • Views to different version of the data were created. • All together 199 tables and views and all together 9,583 variables to be processed

  6. Database – schematic presentation Tables Statuses Data TabX TabX_S TabX_edi TabX_S_edi Views View - All versions of the record View - Last version of the record View - All versions of the record View - Last version of the record

  7. Statistical data processing • Combination of general application and custom made computer programs used for data processing. • Custom made programs: • Insertion of the new units. Units that were according to the field data not AH, but admin data indicated the opposite • Replacement of the wholeset of data in the case where the field data were of bad quality • Calculation of the derived variables • General application: • Logical controls • Individual and systematic corrections • Imputation

  8. General application The metadata driven application for data editing which is used in several other surveys (also in population census) Due to the requirements of the AC 2010 data processing some additional functionalities were added: General metadata driven process for linkage of arbitrary number of tables General metadata driven process for the calculation of the “aggregated derived variables” data on the level of persons, which work at the AH are aggregated to the level of AH Several new imputation methods were added

  9. General application – Pros and Cons Pros: Greater independency from IT persons IT (programming) work decreased significantly Traceability and repeatability is ensured The process documented through the metadata database Cons: More skilled subject personnel needed A lot of metadata produced → sometimes difficult to manage and control

  10. AC 2010 – main challenges AC2010 already by its nature very demanding survey: Large number of units and variables Data from different level (AC holding + persons work at holding) Combination of different data sources makes the job even more complicated Creation of rules (process metadata) was spread among several subject, each of them covering one of the areas → overall coordination quite demanding task In the first phase a lot of errors in the syntax was produced

  11. AC 2010 – main challenges cont’d Large number of variables required large number of process steps (e.g. 16 steps in the imputation part of the process)→ sometimes difficult to follow the process and enable consistency in corrected data Integration of the data from two different sources was a special challenge: Priority setting in the case of the “overlapping” of the sources Large differences in data from different sources had to be resolved → very time consuming

  12. Conclusions – points for discussion What is the influence of the outsourcing of the data collection to the quality of the incoming data? Importance of active cooperation of the SURS staff in testing of the questionnaire and training of interviewers Usage of combined data sources: Large advantage in decrease of the reporting burden Not large influence to costs reduction Increased workload at the data editing stage Usage of different sources increased the quality of final micro data Challenge to find the balance between these factors

  13. Conclusions – points for discussion cont’d Complexity of data processing: Balance between the usage and (if needed) upgrade of general IT solutions and creation of custom made programs Micro-data provided to Eurostat and given on disposal to researchers Can we still afford selective data editing?

  14. Thank you for your attention

More Related