250 likes | 351 Views
Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone Ambroselli (Istat, Italy ) Q 2014 - European Conference on Quality in Official Statistics (Q2014) Vienna , 2 – 5 June 2014 . Prologue
E N D
Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone Ambroselli (Istat, Italy) Q2014 - European Conference on Quality in Official Statistics (Q2014) Vienna, 2 – 5 June 2014
Prologue The use of administrative data increases Dealing with an increase of the use of the Administrative Data (AD) for statistical purposes has become a common condition for the majority of the NSIs in the last decade. In Istat the administrative data sets acquired for statistical uses has increased from 90 in the 2009 to 230 in the 2013 as many statistical processes currently use them or are planning to review the production processes in this direction. Q2014, Vienna
Action AD Management strategy for efficiency and quality [1] • Central level coordination • A dedicated office named ADA (Administrative Data Acquisition and integration, under the Censuses and Statistical Registers Directorate) is responsible for the following tasks: • acquiring AD • storing AD • integrating AD (Integrated System of Microdata - SIM) • evaluating AD quality • make AD and their metadata available to internal statistics producers Q2014, Vienna
AD Management strategy for efficiency and quality [2] • Advantages • It allows to: • better ensure compliance with the legislation on the confidentiality of the data • optimize timeliness and efficiency in acquiring AD and in making them accessible to users within the institute • unify common data treatments • provide a common description of the AD quality through a Quality Report Card • facilitate the management of relationships with AD provider • activate those necessary feedback to improve AD quality, in collaboration with AD producers. Q2014, Vienna
AD Management strategy for efficiency and quality [3] ADA functions Acquisition procedures AD quality evaluation Integrated System of Microdata Repository SIM • Statistical processesusing AD • Dissemination to statisticsusers
General Statistical Business Process Model GSBPM ADA centralized functions 1.1 Identify data needs (considering potential of AD) 1.5 Check data availability Acquisition procedures 2.3 Design collection 3.1 Build collection instrument 4.3 Run collection 4.4 Finalize collection Q2014, Vienna
AD Management strategy for efficiency and quality [3] ADA functions Acquisition procedures AD quality evaluation Integrated System of Microdata Repository SIM • Statistical processesusing AD • Dissemination to statisticsusers
Integrated System of Microdata Repository - SIM • Def.: Repository of integrated administrative microdata to support the statistical production processes • Goals • Make the AD accessible in a uniform way to users within the institute • Avoid duplicate work 1 Titolo intervento, nome cognome relatore – Luogo, data
General Statistical Business Process Model GSBPM ADA centralized functions 5.1 Integrate data SIM 5.2 Classify and code 1 Titolo intervento, nome cognome relatore – Luogo, data
AD Integration The step of integration refers to the process of linkage among objects recorded in different sources: individuals, economic units, places (in progress). Each object entering the SIM is recognized/identified with a unique and stable (over time) ID number. Depending on the linking variable(s) available, a suitable integration strategy and a set of algorithms are applied. Data integration process feeds the development of the DBs for the integration of each subsystem. The DBs for integration are warehouses of microdata useful to guarantee a unified view of the specific object under analysis showing information available in the different sources. 1 Titolo intervento, nome cognome relatore – Luogo, data
ADS 2 ADS 2 ETL process of AD Quality (QRCA) Metadata system ADS 1 ADS N Identification and Integration DBs for the integration for each subsystem Virtual structures Physical structures Support in the development of the thematic DBs for statistical processes SIM border Statistical processes Statistical Registers Statistical Information Systems
SIM: the DBs for integration The presence of the unique ID determines a spider web structure of relationships able to guarantee that every source is connected with the DB for integration and, at the same time, with all the others that are part of the same subsystem of integration. • Structure • • the ID of the sources • • the serial number internal to all the sources in which the object is recognized • • the ID of the object in the subsystem of integration • • the variables used for the integration for all the sources in which the object is present • • the different kind of record linkage used to enter in the DB • the time reference for linkage validity Q2014, Vienna
SIM: the subsystems INDIVIDUAL ID – INDIVIDUAL PLACES ID ECONOMIC UNIT ID – LOCAL UNITS ID SIM PLACES ECONOMIC UNITS SIM PLACES INDIVIDUALS SIM PLACES SIM INDIVIDUALS SIM ECONOMIC UNITS INDIVIDUAL ID ECONOMIC UNIT ID SIM UNITS SIM RELATIONSHIPS BEETWEEN INDIVIDUALS AND ECONOMIC UNITS SIM RELATIONSHIPS ECONOMIC UNIT ID – LOCAL UNITS ID SIM RELATIONSHIPS AMONG INDIVIDUALS SIM RELATIONSHIPS AMONG ECONOMIC UNITS INDIVIDUAL ID – FAMILY ID INDIVIDUAL ID – ECONOMIC UNIT ID Q2014, Vienna
AD Management strategy for efficiency and quality [3] ADA functions Acquisition procedures AD quality evaluation Integrated System of Microdata Repository SIM • Statistical processesusing AD • Dissemination to statisticsusers
Survey data Data Treatment (transformation function) Administrative data Register data Input AD quality evaluation in the Statistical production process “AD quality” is considered in relation to the AD reuse for statistical purposes, taking into account that the AD are not primarily produced for statistical purposes. Q Producer statisticsoriented Data Treatment (transformation function) StatisticalOutput Q User statisticsoriented Quality Report Card for Administrative Data
Quality Report Card for Administrative data – QRCA objectives • Assess the AD quality in terms of input of the statistical production process for its potential usability • Usability analysis • Monitoring AD for two main reasons: a) regulatory changes may induce discontinuity producing significant impacts on the statistics production; b) before AD enter into statistical production process an analysis must be carried out to verify the presence of unexpected lack of quality • AD monitoring function • Check AD compliance with respect to the requests and support the loading data process. Where appropriate define alert / warning to optimize the timing of the data acquisition and release • Data supply monitoring function For Istat potentialusers For Istat currentusers For the AD acquisition process Q2014, Vienna
AD quality framework The AD quality framework considers a hierarchical and multidimensional approach including issues directly connected with the AD quality and those information for the AD management process aimed at improving the statistical AD quality/usability The AD quality framework adopted is based on that originally defined by Statistics Netherlands [1] and then developed within the international BlueEts project, WP4 [2]. [1] Daas et al. (2009) Checklist for the Quality evaluation of AD Sources. Discussion paper 09042, Statistics Netherlands. [2] Daas et al. (2011) Reports on methods preferred for the quality indicators of administrative data sources, Deliverable 4.2 of Workpackage 4 of the BLUE-ETS project. CBS, Netherlands, SSB, Norway, Istat, Italy, SCB, Sweden. Q2014, Vienna
AD qualityframework for AD supplied 1 Titolo intervento, nome cognome relatore – Luogo, data
QRCA and the ADA process Adapting the Quality framework and the QRCA to the Istat ADA process Implementing the QRCA through interoperability among ADA processes With the purpose of complying the appropriate efficiency and timeliness, a system that allows making the AD quality evaluation as automated as possible is being planning. Following the OECD Core principles for metadata management, the strategy aims to take advantage of all the available metadata from the production process using AD, that is to make metadata “active” to the greatest extent possible for supporting the QRCA production. Q2014, Vienna
Implementation of the Source Hyperdimension The implementation of the Source Hyperdimension quality indicators takes advantage of all the information used for the AD acquisition procedures. For the moment this information is managed in a not fully automatic way but Istat is proceeding in this direction storing and organizing it according with the AD quality framework in view of its reuse in the process of quality assessment. With respect to the Relevance quality Dimension, a specific Report for each main AD source is being finalized. It will provide information about all their statistical uses (derived automatically from the Acquisition procedures metadata) and about the compliance with the Istat requirements in terms of quality, timeliness, contents (derived from a very short questionnaire that could be submitted to AD source users). Q2014, Vienna
Implementation of the Metadata Hyperdimension [1] The Clarity quality Dimension considers metadata that should be available from the data source holder. In this regard, a procedure is in place which should allow the acquisition of the metadata together with data. But a strong collaboration with the AD holder has to be expected. In most cases definitions are deduced by free-form metadata available. To make the system more efficient these definitions could be shared among AD source users in the QRCA. In addition, the phase D. “Formal Concept Analysis/ identification of objects and relations” of the ADS, and the consequent data loading in the relational database, can allow to automatically identify the set of objects /entities to be evaluated. 1 Titolo intervento, nome cognome relatore – Luogo, data
Implementation of the Metadata Hyperdimension [2] About the Comparability it should be possible to define a bridge linking the statistical units with the corresponding administrative units and the output statistical variables with the administrative ones used in the production process. Following the strategy of the interoperability among systems, new Istat Unified Metadata System (SUM) [8] could support the QRCA production. In case of processes already using AD, starting from the reference metadata, describing content and quality of statistical data produced and disseminated by Istat through the I.Stat Dissemination System, the traceability task (one of the SUM objectives), should be pursued retaining metadata processes. Q2014, Vienna
Implementation of the Data Hyperdimension ETL process metadata may support the Technical checks dimension (dataset readability, convertibility and compliance with AD requested). The AD integration process is documented in SIM then, for the Integrability quality dimension, it is possible to reuse metadata to compute indicators describing the quality of the Linking variable and, in general, the quality of the record linkage procedures. Using unique and stable (over time) ID number for objects in SIM, Objects Alignment and Comparability indicators may be implemented comparing AD from different sources in SIM or AD with the main Statistical Registers. With respect to the latter, coverage indicators of the Completeness quality dimension, where possible, may be implemented too. Linking the same object in a dataset over time, using the SIM ID number, may produce Dynamic of objects indicators in the Time-related dimension. Q2014, Vienna
Epilogue many indicators may be implemented automatically! others are not derivable using interoperability it is the case of indicators of consistency checks (Accuracy dimension) for the implementation of which, an external intervention has to be considered for the check rules definition and for the AD source Relevance evaluation, concerning the compliance with the Istat requirements in terms of quality, timeliness, contents. the AD source users may provide this information insofar as they have an interest to share with others information on AD used. With this aim some “AD source users groups” are setting up in Istat for the most important data source holder. Q2014, Vienna