1 / 21

USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE. Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic of Slovenia. Overview. Current statistical production cycle in SORS Using the metadata in B laise applications

larsonp
Download Presentation

USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic of Slovenia

  2. Overview • Current statistical production cycle in SORS • Using the metadata in Blaise applications • The role of metadata in automatic editing system in SAS • Metadata connected with the data in Oracle data warehouse • Lessons learnt • Questions

  3. Current statistical production cycle • Entry and micro editing (Blaise) • Macro and statistical editing (SAS) • Storing and analysis (Oracle) • Dissemination (PC-Axis) • Central metadata stores (Klasje & Metis)

  4. Using the metadata in Blaise applications • Generation of (high speed) data-entry applications using Gentry (using by non-IT personnel) • Metadata-based transformations between different data structures (EXTRA-FAT, FAT, THIN)

  5. Questionnaire structure and layout (name, blocks, tables, routing etc.) Field characteristics (length, data type, constants, other parameters) Gentry – tool for generation of the Blaise data-entry application Data type Field characteristics

  6. Gentry – example of generated application header section Data entry for table 12

  7. Transformations All data for one unit(provider) in one row (EXTRA FAT): suitable for micro editing Metadata-based transformation in Blaise Classification and continuous variables in the columns (FAT): suitable for analysis Metadata-based transformation in SAS Classification variables in the columns and continuous variables in the rows (THIN)

  8. The role of metadata in automatic editing system in SAS • General system for automated editing • Process metadata

  9. The role of metadata in automatic editing system in SAS • In order to be general the tool must be able to: • recognize the data which are due to be subjected to editing and/or imputation; • recognize which editing method should be applied, • and with what parameters

  10. Process indicators – level 1 • Mode of data collection • 1 data provided directly by reporting unit • 2 data from administrative source • 3 data computed from original values • 4imputed data – imputation of non-response • 5 imputed data – imputation due to invalid values detected through the editing process • 6 data missing because the unit is not eligible for the item (logical skip)

  11. Process indicators – level 2 • Data status • 1 original value • 2 corrected value

  12. Process indicators – level 3 • Method of data correction • 11correction after telephone contact • 12data reported at a later stage

  13. Process indicators – level 3 • Reporting methods • 11reporting by mail questionnaire • 12computer assisted telephone interview(CATI) • 13telephone interview without computer assistance • 14paper assisted personal interview (PAPI) • 15computer assisted personal interview (CAPI) • 16paper assisted self interviewing • 17computer assisted self interviewing • 18web reporting

  14. Process indicators – level 3 • Imputation methods • 10method of zero values • 11logical imputation • 12historical data imputation • 13mean values imputation • 14nearest neighbour imputation • 15hot-deck imputation • 16cold-deck imputation • 17regression imputation • 18method of the most frequent value • 19estimation of anual value based on infraanual data • 21stochastic hot-deck (random donor) • 22regression imputation with random residuals • 23multiple imputation

  15. 11.15 means: 1 - data provided directly by reporting unit 11 - original value 11.15 - computer assisted personal interview (CAPI) 42.19 means: 4 - imputed data – imputation of non-response 42 - corrected value 42.19 - estimation of anual value based on infraanual data Process indicators examples - xy.zz

  16. Statistical process Blaise Blaise Oracle Key responders SAS Other units SAS

  17. Metadata connected with the data in Oracle data warehouse • On-line access to: • Historical data • Data from different phases (not only final data) • Data for multiple surveys (not only data marts) • Statistical (variables & classifications) and process (time stamps, status indicators...) metadata connected with the data • ...accessible for third-party tools

  18. Conceptual star scheme for SBS THIN table design

  19. Lessons learnt • The role of central repositories for metadata • Natural source of conceptual metadata • Metadata have to be exact, complete and consistant • Process metadata should be connected with the data • Harmonisation of metadata concepts • Local metadata vs. global metadata • The cultural change is needed • Technical considerations • The possibilities for metadata exchange and system integration are good (XML, SQL)

  20. Questions

More Related