200 likes | 208 Views
This article discusses the current situation, issues, and long-term targets for accessing EU confidential data for scientific research purposes. It outlines the legal framework, existing methods of access, and proposes architectural principles for future developments.
E N D
Future of access to EU confidential data for scientific purposes Jean-Marc Museux Eurostat – jean-marc.museux@ec.europa.eu 58th ISI conference, Dublin, August 2011
Aims Provide a vision for the future of access to EU microdata Lay down fundamental principles Identify constraints Provide a aligned view of ongoing and planned EU projects
Current situation Legal framework Access to confidential data for scientific purpose is enabled by the Article 23 of Framework Regulation 223/2009 establishing EU statistics and the European Statistical System Rules and conditions of access are described in an implementing regulation 831/2001 Mainly European Universities and Research bodies Access covered by contracts NSI should provide their consent for each research project Two modes of access are enabled Delivered access of anonymised microdata to research institutions Access in Eurostat safe centre in Luxembourg to non protected data
Current situation Business process NSIs collect data according to harmonised definitions/methodologies at the basis of their national statistics NSIs transmit files to Eurostat for the sake of production of EU statistics – in some domains (mainly household statistics) microdata are transmitted Methods for anonymisation are agreed with 27 MS Files are prepared by Eurostat Eurostat handles research requests and release data according to appropriate procedures Technology Secured transmission of flat files (single entry point – ftp like) and structural meta data and quality report CD rom with anonymised data files
Issues • Heavy procedures - centralisation in Eurostat • Lack of flexibility to adapt to national background - limited list of bodies allowed to get access – limited access modes • Resources limiting the offer • 6 surveys out of the 12 surveys which are enabled in the Regulation 831, • A maximum of 200 access handled per year – not possible to substainably meet rapid increase of demand (20% increase each year) • Overlooking technology development (remote access, secure network over internet)
Long term target – architectural principles Diversity of means of access as a tool to better fit various needs and to monitor costs (public use files - online query system - anonymised files - fat remote access centre) Distribute access to as many points as possible so to improve accessibility of data to researchers Simplify procedures Minimise the number of data duplicates. Integrate research use into the statistical value chain Maximise value added of available data through involvment professional partners/agents
Long term target – architectural principles Develop shared standards and industrialisation to pave the way to mutualisation of decision making, interoperability of systems and the efficient use of rare resources Reuse existing infrastructure whenever possible Take decision according to cost benefit analysis Implement a risk management approach ensuring the adequate level of protection of individual information through the combination of safeguard measures along the 3 pillars : safe people, safe data, safe settings Future Developments on European Level
Constraints Maintain adequate level of interoperability with national provisions with respect to protection of statistical confidentiality Allow for some diversity in modalities, in IT infrastructure, in the user support as a source for emulation, innovation and adaptation to specific needs Maintain costs within operational budgets Ensure security and integrity of the whole systems and data at all steps of the process
Long term vision - The `Schengen` approach The strategic objective is to consider all the data collected under European legislation as a common good of the ESS To empower any NSI to grant access to all the European data given that commonly agreed basic principles are met. To enable access to the whole set of European data from any accredited entry point To set up minimal coordination function Future Developments on European Level
Solution Unique accreditation mechanism for institutions and researchers accessing EU datasets Distributed database with local version of confidential data sets prepared by NSIs, credentials being set locally A central directory of files and access maintained by Eurostat Access to the network through terminal server solution (remote access technology)
Barriers Need for a paradigm shift from a exclusive ownership of data from NSIs to a common ownership and shared responsibilities on EU data – Framework Regulation 223 will probably have to be changed to make explicit these fundamental principles. A careful cost benefit analysis has also still to be developed and an agreed model for cost/burden sharing has still to emerge. A smooth transition should take place step by step.
Stepwise approach – first steps feasibility study (2009-2010) done by network of NSIs (ESSnet : DE, IT, UK, NL, HU) change in the implementing Regulation 831 (2010-2011) new methodological solutions for protection of microdata in a distributed environment (2011-2012) pilot on limited network infrastructure (2012 -2014) infrastructure including data archives and NSIs (FP7 research infrastructure project 2011-2016) ….
1. Revision of implementing EU regulation EU data accreditation not limited to EU universities and research bodies Enabling new modes of access (remote access) Enabling involvement of external partners (data archive,…) Establishing new and cost effective procedures Allow for some flexibility in incorporating new standards
2. New model for micro data protection in a distributed environment Objective criteria based on disclosure risk and data utility (information loss) - Proof of concept run by ESSnet (IT, NL, DE ) Standards ensuring data usability (documentation, code lists, format ….) Guidelines and threshold level for protection of EU datasets
3. Pilot solution • Specific secure environment to host confidential data managed by Eurostat (SICON) • Remote access using terminal server solution from NSI data centres (UK, FR, HU, DE, PT) • Feasibility, cost benefit of extension the network, • Refining the cost model • Tuning procedures and standards for decentralization • Output checking • Researcher supervision and support • Building mutual trust among partners
Conclusions Continuous monitoring - integration of results of different projects Cost benefit analysis at each step Alignment with the vision Discussing with NSI partners