150 likes | 160 Views
Explore the implementation of common validation rules in European statistical systems, focusing on innovative approaches to data validation. Learn about the valuable impact of a unified European infrastructure.
E N D
ESSnetValiDat Foundation Towards a common approach to data validation in Europe
Validation…what is that? if employment status == “old-age pensioner” and age < 35 then error “Too young!” 0.5 < turnover(curMonth)/turnover(prevMonth) < 2 WENN ANZAHL VON Familie[ALLE].Person[MIT Alter < 18] > 0 DANN ... ENDE IF maritalstate=married THEN Age>15 “Too young to be married”ENDIF profit <= 0.6*revenue Lithuania Germany The Netherlands 2
Validation rules from Eurostat SBS FSS Blaise – A Survey Processing System
Where do we do validation? Here.. Blaise – A Survey Processing System
Validation mess leads to data Ping Pong 123 Eurostat NSI The record seems to be 21 returns !!! (resending data files again and again)
New kid … VTL Will VTL solve the problem?
The ESSnet So, in 2015 we run an “ESSnet on validation” • Germany (leading) • Italy • Lithuania • The Netherlands • Eurostat
Workpackages A classical approach to a harmonization issue .. • Find out what approaches do exist in the ESS (the SURVEY!) • Build a sound base for a common methodology on validation • Evaluate the proposed language (VTL) • Identify the requirements for a European infrastructure • Build a prototypical solution • Motivate to participate! 8
The SURVEY • The survey on validation practices is currently running • Sent to all NSI of the ESS • Format: Excel workbook with only a very few validation rules incorporated • Questions on structure of national statistical system, technical solutions and validation practices in five domains: • Census 2011 • Labour Force Survey (LFS) • Structural Business Survey (SBS) • Agriculture Statistics • Price Statistics • Return the questionnaire until 27th of April 9
The SURVEY What we like to know What rules are being used? How are they documented? What effort is involved in the validation (and editing) process? Which tools/applications are used? How is validation process being organized? 10
The SURVEY The survey will burden your colleagues considerably. So why taking part at all? Because a European solution will have its full impact only, if the different situations in the Member States are covered. 11
Methodological work package Whole international workgroups are devoted to the methodology of data validation and editing A lot of work has been done at Eurostat in the previous years What we are planning to do is writing a (short and practical) handbook on validation as a foundation for the discussion in the ESS (and beyond) Additional field of research: Metrics. How can we be sure that a set of validation rules covers most of a survey and is efficient? 12
The LANGUAGE The future validation language has two main goals: It should provide an un-ambigous communication channel among specialists (humans!) It should feed different IT-systems with the necessary specific information of a particular survey These might be contradicting aims! The handbook on methodology („theory“) and the survey („practice“) will give some ideas on what the language has to deal with 13
European infrastructure The main idea developed by Eurostat is a central repository of validation rules (stored in VTL) Rule engines/services should be able to connect to this repository and integrate the rules in the services Tools are needed to specify the rules in user friendly way Is this all Utopia? In Germany we experienced the same kind of challenge more than 15 years ago and developed an infrastructure with exactly these tools and services .. and it works (more or less) 14
Is it worth it? Validation is the single most effort consuming activity in statistical production (estimation in some domains of the German Statistical System: above 60 % of the total workload) Yes, definitly! If you (or colleagues from your office are interested in the results of our project: There will be a workshop on validation from all these aspects in Wiesbaden/Germany from 10th to 11th of November. Join it! 15