170 likes | 321 Views
Development of UK Virtual Microdata Laboratory. Felix Ritchie Shanghai, March 2010. Plan of presentation. Starting principles What we did, and the impact New things we had to develop security model, researcher management, SDC What we’ve learnt
E N D
Development of UK Virtual Microdata Laboratory • Felix Ritchie • Shanghai, March 2010
Plan of presentation • Starting principles • What we did, and the impact • New things we had to develop • security model, researcher management, SDC • What we’ve learnt • what matters, what doesn’t, what we’d do differently • Future directions
Starting principles • Designed by researchers for research • maximum access, limited by law • Expandable • Secure at reasonable cost • Manageable at reasonable cost • Distribute access, not data
Distributed access • Why is this good? • Data always under ONS control • Live monitoring • Simpler, but safer, disclosure control • How does this work in practice? • VML accessible from all ONS computers • Access points in govt. offices in Glasgow and Belfast • Plan to roll-out to more govt offices in 2010 • VML-duplicate set up on academic network • VML set to become exception rather than default data store
What we did • Central data repository and processors • Access via secured thin clients • Work space partitioned by dataset, not usage • researchers get access to dataset, not variables • No access to internet or rest of network • Same system for internal and external users
What we did - outcomes • 30%-50% growth every year • Massive increase in microeconomic analysis • Form almost no firm-level studies to European leaders • Keystone of ONS Administrative Data Project • Total cost ~£350,000 per year • strategy 17%, fixed ops 65% variable ops 18% • income ~£50,000
New things developed (1)The VML Security Model • valid statistical purpose • trusted researchers • anonymisation of data • technical controls around data • disclosure control of results safe projects + safe people + safe data + safe setting + safe outputs safe use
New things developed (2)Output statistical disclosure control • ‘Standard’ SDC not appropriate • traditional rules not appropriate for research environments • SDC on data or methods pointless • Principles-based output SDC • SDC at the point of release • trained researchers • trained staff • agreement on principles and purpose • safe vs unsafe outputs, based on functional form
New things developed (3)Active researcher management • Need to develop shared objectives with researchers • Principles-based SDC needs buy-in from researchers • Reduced management costs • Compulsory training • SDC • VML objectives and constraints • legal and procedural background
What we’ve learnt (1)Things that matter • attitude to researchers • model of SDC • broad scale of operations • including future plans • scale of coherent networks • (for remote access) • eg ONS internal network, Government Secure Intranet, University Intranet, VPN?
What we’ve learnt (2)Things that don’t matter • Location of servers and users • Type of users • Type of data • IT • Metadata • Specific legal/procedural framework?
What we’ve learnt (3)Things we would do differently • Prepare ONS for expansion • senior buy-in • IT planning • better data management • better user management • better metadata
Future directions • Expansion across the government network • Supporting academic equivalent • VML facing massive internal increase in use • Developing international standards • Better communication • wikis, FAQs, common metadata system • metadata • Not being considered • remote job systems • synthetic data
Questions? Felix Ritchie felix.ritchie@ons.gsi.gov.uk Microdata Analysis and User Support maus@ons.gsi.gov.uk
The data model (1) • ‘Spectrum’ of access points balancing • value of data • ease of use • disclosure risk • for a given level of confidentiality, maximise data use and convenience • no ‘one-size-fits-all’ solution • no absolute prohibitions • trade-off is made explicit • users determine appropriate level of access