210 likes | 222 Views
USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE. Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic of Slovenia. Overview. Current statistical production cycle in SORS Using the metadata in B laise applications
E N D
USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic of Slovenia
Overview • Current statistical production cycle in SORS • Using the metadata in Blaise applications • The role of metadata in automatic editing system in SAS • Metadata connected with the data in Oracle data warehouse • Lessons learnt • Questions
Current statistical production cycle • Entry and micro editing (Blaise) • Macro and statistical editing (SAS) • Storing and analysis (Oracle) • Dissemination (PC-Axis) • Central metadata stores (Klasje & Metis)
Using the metadata in Blaise applications • Generation of (high speed) data-entry applications using Gentry (using by non-IT personnel) • Metadata-based transformations between different data structures (EXTRA-FAT, FAT, THIN)
Questionnaire structure and layout (name, blocks, tables, routing etc.) Field characteristics (length, data type, constants, other parameters) Gentry – tool for generation of the Blaise data-entry application Data type Field characteristics
Gentry – example of generated application header section Data entry for table 12
Transformations All data for one unit(provider) in one row (EXTRA FAT): suitable for micro editing Metadata-based transformation in Blaise Classification and continuous variables in the columns (FAT): suitable for analysis Metadata-based transformation in SAS Classification variables in the columns and continuous variables in the rows (THIN)
The role of metadata in automatic editing system in SAS • General system for automated editing • Process metadata
The role of metadata in automatic editing system in SAS • In order to be general the tool must be able to: • recognize the data which are due to be subjected to editing and/or imputation; • recognize which editing method should be applied, • and with what parameters
Process indicators – level 1 • Mode of data collection • 1 data provided directly by reporting unit • 2 data from administrative source • 3 data computed from original values • 4imputed data – imputation of non-response • 5 imputed data – imputation due to invalid values detected through the editing process • 6 data missing because the unit is not eligible for the item (logical skip)
Process indicators – level 2 • Data status • 1 original value • 2 corrected value
Process indicators – level 3 • Method of data correction • 11correction after telephone contact • 12data reported at a later stage
Process indicators – level 3 • Reporting methods • 11reporting by mail questionnaire • 12computer assisted telephone interview(CATI) • 13telephone interview without computer assistance • 14paper assisted personal interview (PAPI) • 15computer assisted personal interview (CAPI) • 16paper assisted self interviewing • 17computer assisted self interviewing • 18web reporting
Process indicators – level 3 • Imputation methods • 10method of zero values • 11logical imputation • 12historical data imputation • 13mean values imputation • 14nearest neighbour imputation • 15hot-deck imputation • 16cold-deck imputation • 17regression imputation • 18method of the most frequent value • 19estimation of anual value based on infraanual data • 21stochastic hot-deck (random donor) • 22regression imputation with random residuals • 23multiple imputation
11.15 means: 1 - data provided directly by reporting unit 11 - original value 11.15 - computer assisted personal interview (CAPI) 42.19 means: 4 - imputed data – imputation of non-response 42 - corrected value 42.19 - estimation of anual value based on infraanual data Process indicators examples - xy.zz
Statistical process Blaise Blaise Oracle Key responders SAS Other units SAS
Metadata connected with the data in Oracle data warehouse • On-line access to: • Historical data • Data from different phases (not only final data) • Data for multiple surveys (not only data marts) • Statistical (variables & classifications) and process (time stamps, status indicators...) metadata connected with the data • ...accessible for third-party tools
Conceptual star scheme for SBS THIN table design
Lessons learnt • The role of central repositories for metadata • Natural source of conceptual metadata • Metadata have to be exact, complete and consistant • Process metadata should be connected with the data • Harmonisation of metadata concepts • Local metadata vs. global metadata • The cultural change is needed • Technical considerations • The possibilities for metadata exchange and system integration are good (XML, SQL)