160 likes | 344 Views
Introduction in the Source and M etadata hyperdimension. Saskia Ossen, and Piet Daas. Content of this module. Introduction in Source and Metadata hyperdimension Introduction of quality checklist Theory and practical examples
E N D
Introduction in the Source andMetadatahyperdimension Saskia Ossen, and Piet Daas
Content of this module • Introduction in Source and Metadata hyperdimension • Introduction of quality checklist • Theory and practical examples • Group exercise in which groups apply the checklist to an “imaginary” source
Quality checklist • The quality checklist: • Can be used to evaluate the Source andMetadata hyperdimensions • Contains 34 indicators • 51 questions (measurement methods) • Takes around 2 hours per data source • Findings are expressed at the dimensional level • 5 for Source, 4 for Metadata • Can be found at: http://www.cbs.nl/nl-NL/menu/methoden/onderzoek-methoden/discussionpapers/archief/2009/2009-42-x10-pub.htm
SOURCE Source hyperdimension SOURCE: - Focus on data source as a whole - Contact information related - Delivery related aspects - and more data source
Evaluation of Source hyperdimension • Here the data source is viewed upon as a file delivered by thedata source holder to the NSI • Dimensions (5): • Supplier Contact information, purpose of use • Relevance NSI use, need, effect on response burden • Privacy and security Legal base, confidentiality, security • DeliveryCosts, arrangements, format, selection • Procedures Collection, changes, feedback, fall-back scenario
Practical example, Source hyperdimension +, good; o, reasonable; -, poor; ?, unclear IPA: Insurance Policy records Administration; SFR: Student Finance Register; CWI: register of Centre for Work and Income; ERR: Exam Results Register; NCP: National Car Pass register; PR: Persons Register; VAT: Value Added Tax data; ICP: Intra-Community Product transactions (EU-countries); NHR: New Housing register;
Practical example, Source hyperdimension • CWI scores ‘poor’ in Delivery • Result of delivery issues (delays) • These need to be solved (and have been solved) • VAT scores low in Procedures • Back-up scenario related, what to do when no or only part of the data is being delivered? • First research efforts purely focused on direct use, currently the back-up options are studied • Other data sources • Quite OK (there are always some things that can be improved)
METADATA Metadata hyperdimension METADATA: Focuses on the (availability of the) information required to understand and use the data in the data source data source 8
Evaluation of Metadatahyperdimension • Focuses on the conceptual metadata quality components of the datasource • Dimensions (4): • Clarity Of units, variables, time definitions and changes • Comparability Of units, variables, and time with those of NSI • Unique keys Presence, similarity to NSI, and alternatives • Data treatment Familiarity with checks and modifications • (by data source holder)
Practical example, Metadata hyperdimension Must have a specific use in mind! +, good; o, reasonable; -, poor; ?, unclear IPA: Insurance Policy records Administration; SFR: Student Finance Register; CWI: register of Centre for Work and Income; ERR: Exam Results Register; NCP: National Car Pass register; PR: Persons Register; VAT: Value Added Tax data; ICP: Intra-Community Product transactions (EU-countries); NHR: New Housing register;
Practical example, Metadata hyperdimension • CWI scores ‘poor’ in Clarity • Definitions used by data source holder are difficult to understand • CWI scores ‘poor’ in Comparability • Because of definitions that are incomparable (and inconvertible) to the once used by Statistics Netherlands • Other data sources • ? for Data treatment indicates that processing by data source keeper needs more attention! (has improved) • Others are quite OK (there are always some things that can be improved)
Conclusions about the checklist • Checklist as a tool: • Good way to assist the user, quite fast • Quality information on a basic but essential (meta-)level • Prevents users from missing important quality components • Independent of the actual delivery of the data! • Nice feature, adds flexibility • A way to pre-evaluate data sources
General remarks / tips • Use checklist to identify problems in Source and Metadata hyperdimension • Do not immediately dive into the data! • Problems in negative scoring dimensions need to be solved before studying the quality of the data • Other less problematic issues can be solved later (at less hectic times) • Considering the limited time needed to determine Source and Metadata it is recommended to always start with these • Repeat when needed
Questions? Any questions or comments?
Introduction in exercise • Let’s try to interpret the findings of a Dutch ‘checklist’
Introduction in exercise • Participants will be split into groups and each group is provided with: • The Source and Metadata evaluation results for an administrative data source • An intended use • Each group will be asked to discuss: • whether the source could be used for the purpose intended • If yes, why is everything OK? • If not, what is the problem(s) that prevents its use and how can it be solved?